Statistical Measures: Variance, Covariance, and Causal Inference
Classified in Mathematics
Written on in
English with a size of 77.91 KB
Statistical Measures and Causal Inference Concepts
Measures of Dispersion and Relationship
Variance
Variance: Estimates how far a set of numbers (random) are spread out from their mean value.
Covariance
Covariance: The relationship between two variables.
- Cov = 0: Unsure of the relationship.
- Cov > 0: Suggests Y will be above average when X is above average.
- Cov < 0: Suggests Y will be below average when X is above average.
The formula for variance is often expressed as: $\mathbb{E}[X^2] - (\mathbb{E}[X])^2$ (where $\mathbb{E}$ is the Expected Value).
The formula for covariance between two variables $X$ and $Y$ is: $\mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$
Pearson's Correlation Coefficient
Standardizes covariance between -1 and 1:
Pearson’s Correlation Coefficient: $\text{corr}(X,Y) = \frac{\text{cov}(X,Y)}{\sigma_X\sigma_Y}$
- Tells us the strength of the linear relationship between two variables.
- The sign indicates the direction of the relationship.
Regression Coefficient
Indicates how much $Y$ changes, on average, as $X$ increases by one unit:
$ ext{Reg}(X \text{ to } Y) = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}$
Expected Value and Counterfactuals
Expected Value
Expected Value: The average of a large number of independent realizations of a variable.
Counterfactual
Used to estimate the effect of a treatment or intervention by comparing the observed outcome to what would have happened if the treatment were different.
Fundamental Problem of Causal Inference
We cannot observe the effect on a unit of being in one state of affairs versus some other state of affairs, because all other states of affairs are counterfactual.
- Assumption: Equivalence in all pre-treatment factors implies equivalence in potential outcomes.
Counterfactual Dependence
$X$ causes $Y$ if and only if:
- $Y$ occurs when $X$ occurs, and
- $Y$ would not have occurred in the counterfactual world where $X$ did not occur.
If the counterfactual is true, there is a causal story.
Treatment Effect = $Y_1 - Y_0$. Mechanism = the way that a cause leads to an effect.
Inferential Errors
Common Inferential Errors
- Confounders/Common Cause: Lead to bias; require ensuring equivalence between treatment and untreated groups.
- Selecting on DV: Only considering cases where certain cases or the dependent variable (DV) occur.
- Reverse Causation.
Potential Outcomes Framework
Definitions
- Potential Outcomes: $Y_0$: outcome with no treatment; $Y_1$: outcome with treatment.
- Treatment Groups: $T_1$: Received treatment; $T_0$: Did not receive treatment.
- Expected Value: Theoretical quantity that tells us the true average for a random outcome that follows a specific structure.
- Empirical Average: Average using data we actually observe. Empirical average differs from expected value due to noise and bias.
Treatment Effect Averages
- Population Average Treatment Effect (PATE) = $\mathbb{E}[Y_{1i}-Y_{0i}]$
- Average Treatment Effect on Treated (ATT) = $\mathbb{E}[Y_{1i}-Y_{0i} | T = 1]$
- Observed Average Treatment Effect (ATE) = $\mathbb{E}[Y | T=1] - \mathbb{E}[Y_0 | T=0]$
ATE vs. ATT
What we get is the ATE (observed); what we want is the ATT.
- How far from ideal: $\text{ATE} - \text{ATT} = \mathbb{E}[Y_0 | T=1] - \mathbb{E}[Y_0 | T=0]$.
- Confounders create the imbalances between ATE and ATT, leading to over/underestimation.
- If $\text{ATE} - \text{ATT} = 0$, the groups are perfect substitutes (apples-to-apples comparison).
Goal of Estimation
Goal: Estimate the ATT (or PATE) using the ATE. Estimate the unobservable quantity with observable data.
- Estimand = ATT (expected value), the unobserved quantity we want.
- Estimate = ATE (empirical value).
A useful equation: Estimate = Estimand + Bias + Noise $\implies$ Correlation $\approx$ Causation + Bias + Noise
Bias and Noise
Bias
Bias: Misses in a particular direction.
- Unbiased if, over time, the average value of estimates equals the estimand.
- Bias is non-zero whenever $\mathbb{E}[Y_0|T=1] \ne \mathbb{E}[Y_0|T=0]$.
- If the bias term is greater than zero, $\text{ATE} > \text{ATT}$: overestimate (positive bias).
- If the bias term is less than zero, $\text{ATE} < \text{ATT}$: underestimate (negative bias).
- Bias does not disappear as sample size increases.
- Preventing Bias = Randomization.
Noise
Differences between our estimand and our estimate that arise due to idiosyncratic facts about our sample.
- Represents the spread of estimates, sampling variation, a statistical inference problem.
- As the sample gets larger, there is less noise; the average of estimates converges on the estimand.
- Law of Large Numbers (LLN): As $N$ gets large, the average of the sample (ATE) approaches the true mean: $\text{Avg}(Y|\text{Sample}) \to \mathbb{E}[Y]$.
- Central Limit Theorem (CLT): Variance of noise decreases.
Estimation Procedures and Significance
Estimator Properties
- Estimator: The procedure we apply to data to generate a numerical result.
- Unbiased: If by repeating our estimation procedure infinitely many times, the average value of our estimates would equal the estimand.
- Precision: If by repeating our estimation procedure, the various estimates would be close to each other.
Standard Error and Confidence Intervals
Standard Error (SE): Standard deviation; how far the estimate would be from the estimand. We expect the true value to be within 2 SE (95% CI).
- $\mathbb{E}[\text{noise}] = 0$, $\text{SD}[\text{Noise}] = \sqrt{\frac{\text{Var}(Y)}{N}}$ (if the estimator is unbiased).
95% Confidence Interval: If we applied the estimator infinitely many times, each time on a new sample of data, the estimand would be contained in the 95% confidence interval 95 percent of the time. It is not true that we are 95 percent confident that the true estimand lies in the specific 95% confidence interval calculated from one sample.
Hypothesis Testing
Null Hypothesis ($H_0$): The hypothesis that some feature of the data is entirely the result of noise ($\text{ATT} = 0$).
P-value: Assesses statistical significance. If the p-value is less than 0.05, we have statistically significant evidence (at the 95% confidence level) that the relationship is real, and we reject the null hypothesis.
- Small p-value: Low chance the result occurred due to noise alone.
- High p-value: Higher chance the result occurred due to noise alone ($\text{ATT} = 0$ might be true).
- If the null hypothesis of $\text{ATT} = 0$ is not in the CI, then we can reject the null, and the p-value must be small.
Statistical Significance: We can reject the null hypothesis at some prespecified level of confidence (typically, 95% confidence).
Experimental Design Considerations
Design Issues
- Blocking/Stratification: Dividing experimental subjects into different groups (typically groups believed to have similar potential outcomes) and then randomizing the treatment within each group.
- Noncompliance: When a subject chooses a treatment status other than the one to which it was assigned.
- Chance Imbalance: Despite random assignment, the treated and untreated groups differ in important ways because of noise.
Statistical Power
Statistical Power: The probability of rejecting the null hypothesis of no effect if the true effect is of a certain non-zero magnitude.
- Lack of Statistical Power: The inability to detect a true effect if one exists (addressed by reducing noise or increasing $N$).
Other Issues
- Attrition: Experimental subjects drop out of the experiment, such that outcomes are not observed for those subjects.
- Interference: The situation where the treatment status of one unit affects the outcome of another unit (Contamination, Spillover).
Validity
- Internal Validity: The extent the experiment supports the claim about cause and effect; can we get an unbiased estimate of the treatment effect?
- External Validity: Generalization; what populations, settings, and treatment variables can this effect be generalized to?