Statistical Measures: Variance, Covariance, and Causal Inference

Classified in Mathematics

Written on October 21, 2025 in English with a size of 77.91 KB

Statistical Measures and Causal Inference Concepts

Measures of Dispersion and Relationship

Variance

Variance: Estimates how far a set of numbers (random) are spread out from their mean value.

Covariance

Covariance: The relationship between two variables.

Cov = 0: Unsure of the relationship.
Cov > 0: Suggests Y will be above average when X is above average.
Cov < 0: Suggests Y will be below average when X is above average.

The formula for variance is often expressed as: $\mathbb{E}[X^2] - (\mathbb{E}[X])^2$ (where $\mathbb{E}$ is the Expected Value).

The formula for covariance between two variables $X$ and $Y$ is: $\mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$

Pearson's Correlation Coefficient

Standardizes covariance between -1 and 1:

Pearson’s Correlation Coefficient: $\text{corr}(X,Y) = \frac{\text{cov}(X,Y)}{\sigma_X\sigma_Y}$

Tells us the strength of the linear relationship between two variables.
The sign indicates the direction of the relationship.

Regression Coefficient

Indicates how much $Y$ changes, on average, as $X$ increases by one unit:

$ ext{Reg}(X \text{ to } Y) = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}$

Expected Value and Counterfactuals

Expected Value

Expected Value: The average of a large number of independent realizations of a variable.

Counterfactual

Used to estimate the effect of a treatment or intervention by comparing the observed outcome to what would have happened if the treatment were different.

Fundamental Problem of Causal Inference

We cannot observe the effect on a unit of being in one state of affairs versus some other state of affairs, because all other states of affairs are counterfactual.

Assumption: Equivalence in all pre-treatment factors implies equivalence in potential outcomes.

Counterfactual Dependence

$X$ causes $Y$ if and only if:

$Y$ occurs when $X$ occurs, and
$Y$ would not have occurred in the counterfactual world where $X$ did not occur.

If the counterfactual is true, there is a causal story.

Treatment Effect = $Y_1 - Y_0$. Mechanism = the way that a cause leads to an effect.

Inferential Errors

Common Inferential Errors

Confounders/Common Cause: Lead to bias; require ensuring equivalence between treatment and untreated groups.
Selecting on DV: Only considering cases where certain cases or the dependent variable (DV) occur.
Reverse Causation.

Potential Outcomes Framework

Definitions

Potential Outcomes: $Y_0$: outcome with no treatment; $Y_1$: outcome with treatment.
Treatment Groups: $T_1$: Received treatment; $T_0$: Did not receive treatment.
Expected Value: Theoretical quantity that tells us the true average for a random outcome that follows a specific structure.
Empirical Average: Average using data we actually observe. Empirical average differs from expected value due to noise and bias.

Treatment Effect Averages

Population Average Treatment Effect (PATE) = $\mathbb{E}[Y_{1i}-Y_{0i}]$
Average Treatment Effect on Treated (ATT) = $\mathbb{E}[Y_{1i}-Y_{0i} | T = 1]$
Observed Average Treatment Effect (ATE) = $\mathbb{E}[Y | T=1] - \mathbb{E}[Y_0 | T=0]$

ATE vs. ATT

What we get is the ATE (observed); what we want is the ATT.

How far from ideal: $\text{ATE} - \text{ATT} = \mathbb{E}[Y_0 | T=1] - \mathbb{E}[Y_0 | T=0]$.
Confounders create the imbalances between ATE and ATT, leading to over/underestimation.
If $\text{ATE} - \text{ATT} = 0$, the groups are perfect substitutes (apples-to-apples comparison).

Goal of Estimation

Goal: Estimate the ATT (or PATE) using the ATE. Estimate the unobservable quantity with observable data.

Estimand = ATT (expected value), the unobserved quantity we want.
Estimate = ATE (empirical value).

A useful equation: Estimate = Estimand + Bias + Noise $\implies$ Correlation $\approx$ Causation + Bias + Noise

Bias and Noise

Bias

Bias: Misses in a particular direction.

Unbiased if, over time, the average value of estimates equals the estimand.
Bias is non-zero whenever $\mathbb{E}[Y_0|T=1] \ne \mathbb{E}[Y_0|T=0]$.
If the bias term is greater than zero, $\text{ATE} > \text{ATT}$: overestimate (positive bias).
If the bias term is less than zero, $\text{ATE} < \text{ATT}$: underestimate (negative bias).
Bias does not disappear as sample size increases.
Preventing Bias = Randomization.

Noise

Differences between our estimand and our estimate that arise due to idiosyncratic facts about our sample.

Represents the spread of estimates, sampling variation, a statistical inference problem.
As the sample gets larger, there is less noise; the average of estimates converges on the estimand.
Law of Large Numbers (LLN): As $N$ gets large, the average of the sample (ATE) approaches the true mean: $\text{Avg}(Y|\text{Sample}) \to \mathbb{E}[Y]$.
Central Limit Theorem (CLT): Variance of noise decreases.

Estimation Procedures and Significance

Estimator Properties

Estimator: The procedure we apply to data to generate a numerical result.
Unbiased: If by repeating our estimation procedure infinitely many times, the average value of our estimates would equal the estimand.
Precision: If by repeating our estimation procedure, the various estimates would be close to each other.

Standard Error and Confidence Intervals

Standard Error (SE): Standard deviation; how far the estimate would be from the estimand. We expect the true value to be within 2 SE (95% CI).

$\mathbb{E}[\text{noise}] = 0$, $\text{SD}[\text{Noise}] = \sqrt{\frac{\text{Var}(Y)}{N}}$ (if the estimator is unbiased).

95% Confidence Interval: If we applied the estimator infinitely many times, each time on a new sample of data, the estimand would be contained in the 95% confidence interval 95 percent of the time. It is not true that we are 95 percent confident that the true estimand lies in the specific 95% confidence interval calculated from one sample.

Hypothesis Testing

Null Hypothesis ($H_0$): The hypothesis that some feature of the data is entirely the result of noise ($\text{ATT} = 0$).

P-value: Assesses statistical significance. If the p-value is less than 0.05, we have statistically significant evidence (at the 95% confidence level) that the relationship is real, and we reject the null hypothesis.

Small p-value: Low chance the result occurred due to noise alone.
High p-value: Higher chance the result occurred due to noise alone ($\text{ATT} = 0$ might be true).
If the null hypothesis of $\text{ATT} = 0$ is not in the CI, then we can reject the null, and the p-value must be small.

Statistical Significance: We can reject the null hypothesis at some prespecified level of confidence (typically, 95% confidence).

Experimental Design Considerations

Design Issues

Blocking/Stratification: Dividing experimental subjects into different groups (typically groups believed to have similar potential outcomes) and then randomizing the treatment within each group.
Noncompliance: When a subject chooses a treatment status other than the one to which it was assigned.
Chance Imbalance: Despite random assignment, the treated and untreated groups differ in important ways because of noise.

Statistical Power

Statistical Power: The probability of rejecting the null hypothesis of no effect if the true effect is of a certain non-zero magnitude.

Lack of Statistical Power: The inability to detect a true effect if one exists (addressed by reducing noise or increasing $N$).

Other Issues

Attrition: Experimental subjects drop out of the experiment, such that outcomes are not observed for those subjects.
Interference: The situation where the treatment status of one unit affects the outcome of another unit (Contamination, Spillover).

Validity

Internal Validity: The extent the experiment supports the claim about cause and effect; can we get an unbiased estimate of the treatment effect?
External Validity: Generalization; what populations, settings, and treatment variables can this effect be generalized to?

Abw9ma9BpVx4AAAAAElFTkSuQmCC

Related entries:

Tags: