Statistical Regression Models and Data Interpretation
Posted by Anonymous and classified in Mathematics
Written on in
English with a size of 1.2 MB
Executive Summary of Regression Models
- Simple Linear Regression: On average, for every 1-unit increase in [X], the expected [Y] changes by β1 units (95% CI: …).
- Multiplicative Model: On average, a 1-unit increase in [X] multiplies the median [Y] by exp(β1), resulting in a 100·(exp(β1)–1)% change (95% CI: …).
- Power Law/Elasticity: A 1% increase in [X] is associated with a β1% change in [Y] (95% CI: …).
- Categorical Variable: Students in Group A scored on average β1 units higher or lower than those in Group B (95% CI: …).
- Categorical Variable (3-Group): After adjusting for [X], students taught with Method 2 scored on average β1 units higher than those with Method 1; Method 3 scored β3 units lower.
- Interaction: For Group A, a 1-unit increase in X increases Y by β1. For Group B, the increase is (β1 + β3), showing that the effect of X differs between groups.
Understanding Statistical Intervals
Confidence Interval (CI): “We estimate the average [Y] for [X=…] to be between … and ….”
Prediction Interval (PI): “For an individual with [X=…], their [Y] is predicted to lie between … and ….”
Mathematical Model Equations
| Model | Equation |
|---|---|
| Simple Linear Regression | |
| Quadratic Model | |
| Log-Linear | |
| Log-Log | |
| Categorical Predictor | |
| Multiple Groups | |
| Interaction (Numeric, Categorical) | |
| Interaction (Numeric, Numeric) |
Analyzing Scatter Plot Trends and Shapes
Trend and Shape
- Clear increasing linear trend between X and Y.
- Clear exponentially decreasing trend (e.g., distance–decay, growth vs. length, salary vs. experience).
- Relationship may be curved, indicating a quadratic pattern.
Group Differences
- Group A has a higher average Y than Group B across most X values.
- Difference is not consistent: groups overlap at low X but diverge at high X.
- Parallel lines: Groups differ in level (intercept) but not in slope.
- Non-parallel lines: Slopes differ, suggesting an interaction effect.
Variability and Spread
- Variability is larger at higher values of Y, indicating heteroscedasticity.
- Scatter seems fairly constant across the range of X.
- Transformation (e.g., log) makes the spread more even and symmetric.
Special Cases
- Bimodality: One group shows two clusters (e.g., high vs. low performers).
- Outliers: Unusually low or high values relative to the rest of the data.
- Different groups sampled at different X ranges can mask trends if colors are ignored.
Interpretation Template
“Looking at the scatter plot, we see a [increasing / decreasing / exponential / linear / curved] relationship between X and Y. The [Group A] values are generally [higher/lower] than [Group B], and the lines appear [parallel/non-parallel], suggesting [no interaction / an interaction]. The scatter is [fairly constant / more variable at higher X], and a log transform would make the relationship more [linear/symmetric] with more even spread. There is also evidence of [outliers/bimodality/overlap at low X].”
Evaluating Model Appropriateness
Why a Log Model May Be Appropriate
- Variance Stabilization: Because higher values of Y showed larger variability, logging Y made the scatter more even (constant variance).
- Linearization: The original relationship looked exponential or curved, and logging made it approximately linear.
- Skewness: The response was right-skewed; logging made it more symmetric.
- Interpretability: Log allows effects to be expressed as percentage changes, which is often more suitable for financial or biological data.
Improving Normality and Variability
Yes case: The residuals show increasing spread with higher Y, so logging would reduce skewness and stabilize variance.
No case: The data is left-skewed or has smaller variance at higher values; logging would worsen the fit.
Determining the Best Fit
- Model X fits best because the residuals form a random horizontal band around zero with no curvature and fairly constant spread.
- Other models show curvature, fanning, or unequal spread, failing to meet assumptions.
- The log model gives the most patternless residuals, making it the preferred choice.
Contextual Model Explanations
Link the chosen model back to the real-world context:
- Log Model: Indicates a multiplicative relationship where each unit increase in X corresponds to a percentage change in Y.
- Quadratic Model: Indicates that the effect of X on Y increases initially but eventually levels off. (Reference: 20x Test SS 24 Answers).
- Interaction Model: Shows that the effect of X on Y differs between groups (non-parallel lines).
Limitations of Single Measures of Center
- Different Groups / Bimodal: The plot shows two subgroups; therefore, the mean or median hides these distinct differences.
- Unequal Spread: One group has much larger variability, so comparing only centers ignores the difference in spread.
- Skewed Distribution: The distribution is skewed, so the mean and median differ; neither alone provides the full picture.
- Shape Differences: Although group centers are similar, the distributions differ in shape and spread, making a single measure misleading.
Interpreting Evidence, P-values, and Effects
Is there evidence of a relationship between Y and X?
No. The slope for SOG is not significant (p = 0.414), so we have no evidence of a relationship between Drag and SOG.
Is there evidence for Drag vs. Trawl?
Yes. The Trawl effect is highly significant (p = 1.09 × 10¹¹), providing strong evidence that Drag differs by trawl type.
Is the bat-wing net working as intended?
Yes. In the model, TrawlFR = +137.43 kg (95% CI: 102.15 to 172.71), meaning FR nets have approximately 137 kg more drag than the baseline (bat-wing). Therefore, bat-wing nets have lower drag and are working as intended.
Past Test Questions and Answers
2025 Summer School: