Statistical Regression Models and Data Interpretation

Posted by Anonymous and classified in Mathematics

Written on in English with a size of 1.2 MB

Executive Summary of Regression Models

  • Simple Linear Regression: On average, for every 1-unit increase in [X], the expected [Y] changes by β1 units (95% CI: …).
  • Multiplicative Model: On average, a 1-unit increase in [X] multiplies the median [Y] by exp(β1), resulting in a 100·(exp(β1)–1)% change (95% CI: …).
  • Power Law/Elasticity: A 1% increase in [X] is associated with a β1% change in [Y] (95% CI: …).
  • Categorical Variable: Students in Group A scored on average β1 units higher or lower than those in Group B (95% CI: …).
  • Categorical Variable (3-Group): After adjusting for [X], students taught with Method 2 scored on average β1 units higher than those with Method 1; Method 3 scored β3 units lower.
  • Interaction: For Group A, a 1-unit increase in X increases Y by β1. For Group B, the increase is (β1 + β3), showing that the effect of X differs between groups.

Understanding Statistical Intervals

Confidence Interval (CI): “We estimate the average [Y] for [X=…] to be between … and ….”

Prediction Interval (PI): “For an individual with [X=…], their [Y] is predicted to lie between … and ….”

Mathematical Model Equations

ModelEquation
Simple Linear Regressionqe86NppDWj71FQwoIlwzCMCkuQs2QYhvnqY8GSYRhGBRYsGYZhVGDBkmEYRgUWLBmGYVT4P566f7TRoaNqAAAAAElFTkSuQmCC
Quadratic ModelAUsVkJw64ch7AAAAAElFTkSuQmCC
Log-LinearHaJG8GFtQIUAAAAASUVORK5CYII=
Log-Log0+FjiUlPL5w9yUZjuPuKgXfJ8NxHCfHkwzHcUXFkwzHcUXFkwzHcUXFkwzHcUXFkwzHcUXFkwzHcUXFkwzHcUX1dyfEc90AVWIrAAAAAElFTkSuQmCC
Categorical Predictor1a7I+0JBtJQu2PTar6UtLfU8RCeegHNqYt8dub5ef6oAeoObz4R4BLUB664b6nweZt2YhwU1l+Xjt6bubgcEncEVwZAQX+R0Bayj6A2KWiL6UkFILwXbLij0+C8F0jQiEIHBEKQeCIUAgCR4RCEDgiFILAEaEQBI4IhSBwRCgEgSNCIQgcEQpB4IhQCAJHhEIQOCIUgsD5H0J229Z8VNh+AAAAAElFTkSuQmCC
Multiple Groupsw9dI9StNHog2AAAAABJRU5ErkJggg==
Interaction (Numeric, Categorical)B+duA1XHYI8pAAAAAElFTkSuQmCC
Interaction (Numeric, Numeric)AfzUjcBSZz0AAAAASUVORK5CYII=

Analyzing Scatter Plot Trends and Shapes

Trend and Shape

  • Clear increasing linear trend between X and Y.
  • Clear exponentially decreasing trend (e.g., distance–decay, growth vs. length, salary vs. experience).
  • Relationship may be curved, indicating a quadratic pattern.

Group Differences

  • Group A has a higher average Y than Group B across most X values.
  • Difference is not consistent: groups overlap at low X but diverge at high X.
  • Parallel lines: Groups differ in level (intercept) but not in slope.
  • Non-parallel lines: Slopes differ, suggesting an interaction effect.

Variability and Spread

  • Variability is larger at higher values of Y, indicating heteroscedasticity.
  • Scatter seems fairly constant across the range of X.
  • Transformation (e.g., log) makes the spread more even and symmetric.

Special Cases

  • Bimodality: One group shows two clusters (e.g., high vs. low performers).
  • Outliers: Unusually low or high values relative to the rest of the data.
  • Different groups sampled at different X ranges can mask trends if colors are ignored.

Interpretation Template

“Looking at the scatter plot, we see a [increasing / decreasing / exponential / linear / curved] relationship between X and Y. The [Group A] values are generally [higher/lower] than [Group B], and the lines appear [parallel/non-parallel], suggesting [no interaction / an interaction]. The scatter is [fairly constant / more variable at higher X], and a log transform would make the relationship more [linear/symmetric] with more even spread. There is also evidence of [outliers/bimodality/overlap at low X].”


Evaluating Model Appropriateness

Why a Log Model May Be Appropriate

  • Variance Stabilization: Because higher values of Y showed larger variability, logging Y made the scatter more even (constant variance).
  • Linearization: The original relationship looked exponential or curved, and logging made it approximately linear.
  • Skewness: The response was right-skewed; logging made it more symmetric.
  • Interpretability: Log allows effects to be expressed as percentage changes, which is often more suitable for financial or biological data.

Improving Normality and Variability

Yes case: The residuals show increasing spread with higher Y, so logging would reduce skewness and stabilize variance.

No case: The data is left-skewed or has smaller variance at higher values; logging would worsen the fit.

Determining the Best Fit

  • Model X fits best because the residuals form a random horizontal band around zero with no curvature and fairly constant spread.
  • Other models show curvature, fanning, or unequal spread, failing to meet assumptions.
  • The log model gives the most patternless residuals, making it the preferred choice.

Contextual Model Explanations

Link the chosen model back to the real-world context:

  • Log Model: Indicates a multiplicative relationship where each unit increase in X corresponds to a percentage change in Y.
  • Quadratic Model: Indicates that the effect of X on Y increases initially but eventually levels off. (Reference: 20x Test SS 24 Answers).
  • Interaction Model: Shows that the effect of X on Y differs between groups (non-parallel lines).

Limitations of Single Measures of Center

  • Different Groups / Bimodal: The plot shows two subgroups; therefore, the mean or median hides these distinct differences.
  • Unequal Spread: One group has much larger variability, so comparing only centers ignores the difference in spread.
  • Skewed Distribution: The distribution is skewed, so the mean and median differ; neither alone provides the full picture.
  • Shape Differences: Although group centers are similar, the distributions differ in shape and spread, making a single measure misleading.

Interpreting Evidence, P-values, and Effects

iM8YYY4x9jgMSxhhjjNU4DkgYY4wxVuM4IGGMMcZYjeOAhDHGGGM1jgMSxhhjjNU4DkgYY4wxVuM4IGGMMcZYjeOAhDHGGGM1jgMSxhhjjNU4DkgYY4wxVuP+Dw6nbE6bUJW1AAAAAElFTkSuQmCC

Is there evidence of a relationship between Y and X?

No. The slope for SOG is not significant (p = 0.414), so we have no evidence of a relationship between Drag and SOG.

Is there evidence for Drag vs. Trawl?

Yes. The Trawl effect is highly significant (p = 1.09 × 10¹¹), providing strong evidence that Drag differs by trawl type.

Is the bat-wing net working as intended?

Yes. In the model, TrawlFR = +137.43 kg (95% CI: 102.15 to 172.71), meaning FR nets have approximately 137 kg more drag than the baseline (bat-wing). Therefore, bat-wing nets have lower drag and are working as intended.


Past Test Questions and Answers

2025 Summer School:

K6mPABtawAAAABJRU5ErkJggg==

A2k9LqtLkQDcAAAAAElFTkSuQmCC rRZdl1bq7vQ69FZ8nvR4RCCCGEENbmOZk5E0IIIYR4PjwHM2dCCCGEEM8PCc6EEEIIIayIBGdCCCGEEFZEgjMhhBBCCCsiwZkQQgghhBWR4EwIIYQQwopIcCaEEEIIYUUkOBNCCCGEsCISnAkhhBBCWBEJzoQQQgghrIgEZ0IIIYQQVkSCMyGEEEIIKyLBmRBCCCGEFZHgTAghhBDCivwfoaQqX4ubOpEAAAAASUVORK5CYII= 12UlSZIkSZIk6Y7IpE+SJEmSJKkRkEmfJEmSJElSIyCTPkmSJEmSpEZAJn2SJEmSJEmNgEz6JEmSJEmSGgGZ9EmSJEmSJDUCMumTJEmSJElqBGTSJ0mSJEmS1AjIpE+SJEmSJKkRkEmfJEmSJElSIyCTPkmSJEmSpEZAJn2SJEmSJEmNgEz6JEmSJEmSGgGZ9EmSJEmSJDUCMumTJEmSJElqBGTSJ0mSJEmS1Aj8P7g36jnFUMHaAAAAAElFTkSuQmCC BxnZmrGnyhMoAAAAAElFTkSuQmCC kGgAAAABJRU5ErkJggg== 542xTUtISLxmpKIqISEhISHxiij3Ay4JCQkJCQmJ0pGKqoSEhISExCtCKqoSEhISEhKvCKmoSkhISEhIvCKkoiohISEhIfGKkIqqhISEhITEK0IqqhISEhISEq8IqahKSEhISEi8Iv4fsTethGOWrQsAAAAASUVORK5CYII= s6AAAAAASUVORK5CYII=

Related entries: