Statistical Problem Solving: Regression, Probability, and Bayes' Theorem
Classified in Mathematics
Written on in
English with a size of 5.73 KB
Child Weight Evolution: Linear Regression Analysis
The following table shows the evolution of the weight of a child between nine and fifteen months:
Data Table: Months (X) vs. Weight (Y)
| Months (X) | Weight (Y, kg) |
|---|---|
| 9 | 9.2 |
| 10 | 9.6 |
| 11 | 9.8 |
| 12 | 10.1 |
| 13 | 10.1 |
| 14 | 10.3 |
| 15 | 10.6 |
Regression Calculation Results
The calculation requires finding the linear regression line of X on Y (predicting age based on weight).
| X (Months) | Y (Weight) | X * Y |
|---|---|---|
| 9 | 9.2 | 82.8 |
| 10 | 9.6 | 96.0 |
| 11 | 9.8 | 107.8 |
| 12 | 10.1 | 121.2 |
| 13 | 10.1 | 131.3 |
| 14 | 10.3 | 144.2 |
| 15 | 10.6 | 159.0 |
Summary Statistics:
- Average (X, Y, XY): 12, 9.957, 120.329
- Standard Deviation (X, Y): 2, 0.430
- Covariance: 0.843
- Correlation Coefficient: 0.979
Regression Line (X on Y):
$$X = 4.55 \cdot Y - 33.29$$
Prediction: Finding the age (X) when the weight (Y) is 11.5 kg.
The value that corresponds to Y = 11.5 kg is X = 19.02 months.
Probability of Health Conditions (H & A Events)
Among those affected by disease, 58% have Hypertension (H) and 47% have High Cholesterol (A). One-fifth (20%) have both symptoms.
Given probabilities:
- P(H) = 0.58
- P(A) = 0.47
- P(H ∩ A) = 0.20 (One-fifth have both)
Description and Calculation of Events
We describe the following events and calculate their probabilities:
Hⁿ (H complement): Does not have hypertension.
$$P(H^c) = 1 - P(H) = 1 - 0.58 = \mathbf{0.42}$$
H ∪ A (H union A): Has hypertension or high cholesterol (or both).
$$P(H \cup A) = P(H) + P(A) - P(H \cap A) = 0.58 + 0.47 - 0.20 = \mathbf{0.85}$$
Hⁿ ∩ Aⁿ (H complement intersection A complement): Does not have hypertension and does not have high cholesterol.
$$P(H^c \cap A^c) = 1 - P(H \cup A) = 1 - 0.85 = \mathbf{0.15}$$
P(H | A): Has hypertension, given that they have high cholesterol (A is the population).
$$P(H | A) = \frac{P(H \cap A)}{P(A)} = \frac{0.20}{0.47} \approx \mathbf{0.4255}$$
P(A | H): Has high cholesterol, given that they have hypertension (H is the population).
$$P(A | H) = \frac{P(H \cap A)}{P(H)} = \frac{0.20}{0.58} \approx \mathbf{0.3448}$$
Independence Check
Are the events H and A independent?
No, because the conditional probability $P(H | A) \approx 0.4255$ is not equal to the marginal probability $P(H) = 0.58$. If they were independent, these values would be equal.
Descriptive Statistics and Frequency Distribution
Frequency Table Analysis
| Real Limits ($L_i, L_s$) | Apparent Limits | Class Mark ($X_i$) | Frequency ($f_i$) | Cumulative Frequency ($N_i$) | Relative Frequency ($r_i$) | Cumulative Relative Frequency ($R_i$) |
|---|---|---|---|---|---|---|
| (39.95, 49.95) | 40.0 - 49.9 | 44.95 | 47 | 47 | 0.235 | 0.235 |
| (49.95, 59.95) | 50.0 - 59.9 | 54.95 | 83 | 130 | 0.415 | 0.650 |
| (59.95, 69.95) | 60.0 - 69.9 | 64.95 | 42 | 172 | 0.210 | 0.860 |
| (69.95, 79.95) | 70.0 - 79.9 | 74.95 | 28 | 200 | 0.140 | 1.000 |
Key Statistical Measures
Central 90% Range
The central 90% of the population lies between the 5th percentile and the 95th percentile:
Between 42.08 (5th percentile) and 76.38 (95th percentile).
Mean and Standard Deviation
- Average (Mean): 57.50
- Standard Deviation: 9.70
Coefficient of Asymmetry (Skewness)
- Asymmetry Coefficient: 0.39
- Interpretation: Since the coefficient is positive, the data are slightly skewed to the right (the tail extends to the right).
Bayes' Theorem Application in Medical Diagnosis
A medical team doctor believes, based on clinical history and tests, that a patient may have condition $E_1$, $E_2$, or $E_3$ with the following a priori probabilities:
- $P(E_1) = 0.40$
- $P(E_2) = 0.55$
- $P(E_3) = 0.05$
A new symptom, $S$, appears. The likelihoods of this symptom occurring given each disease are:
- $P(S | E_1) = 0.80$ (80% of people with $E_1$ have $S$)
- $P(S | E_2) = 0.30$ (30% of people with $E_2$ have $S$)
- $P(S | E_3) = 0.90$ (90% of people with $E_3$ have $S$)
In the presence of this new symptom $S$, we must use Bayes' Formula to calculate the altered (posterior) chances that the patient has each of the three diseases, giving the results to three decimal places.
(Note: The calculation steps for $P(E_i | S)$ are required but not provided in the original text. The problem asks for the calculation using Bayes' Formula.)