Statistical Problem Solving: Regression, Probability, and Bayes' Theorem

Classified in Mathematics

Written on in English with a size of 5.73 KB

Child Weight Evolution: Linear Regression Analysis

The following table shows the evolution of the weight of a child between nine and fifteen months:

Data Table: Months (X) vs. Weight (Y)

Months (X)Weight (Y, kg)
99.2
109.6
119.8
1210.1
1310.1
1410.3
1510.6

Regression Calculation Results

The calculation requires finding the linear regression line of X on Y (predicting age based on weight).

X (Months)Y (Weight)X * Y
99.282.8
109.696.0
119.8107.8
1210.1121.2
1310.1131.3
1410.3144.2
1510.6159.0

Summary Statistics:

  • Average (X, Y, XY): 12, 9.957, 120.329
  • Standard Deviation (X, Y): 2, 0.430
  • Covariance: 0.843
  • Correlation Coefficient: 0.979

Regression Line (X on Y):

$$X = 4.55 \cdot Y - 33.29$$

Prediction: Finding the age (X) when the weight (Y) is 11.5 kg.

The value that corresponds to Y = 11.5 kg is X = 19.02 months.

Probability of Health Conditions (H & A Events)

Among those affected by disease, 58% have Hypertension (H) and 47% have High Cholesterol (A). One-fifth (20%) have both symptoms.

Given probabilities:

  • P(H) = 0.58
  • P(A) = 0.47
  • P(H ∩ A) = 0.20 (One-fifth have both)

Description and Calculation of Events

We describe the following events and calculate their probabilities:

  1. Hⁿ (H complement): Does not have hypertension.

    $$P(H^c) = 1 - P(H) = 1 - 0.58 = \mathbf{0.42}$$

  2. H ∪ A (H union A): Has hypertension or high cholesterol (or both).

    $$P(H \cup A) = P(H) + P(A) - P(H \cap A) = 0.58 + 0.47 - 0.20 = \mathbf{0.85}$$

  3. Hⁿ ∩ Aⁿ (H complement intersection A complement): Does not have hypertension and does not have high cholesterol.

    $$P(H^c \cap A^c) = 1 - P(H \cup A) = 1 - 0.85 = \mathbf{0.15}$$

  4. P(H | A): Has hypertension, given that they have high cholesterol (A is the population).

    $$P(H | A) = \frac{P(H \cap A)}{P(A)} = \frac{0.20}{0.47} \approx \mathbf{0.4255}$$

  5. P(A | H): Has high cholesterol, given that they have hypertension (H is the population).

    $$P(A | H) = \frac{P(H \cap A)}{P(H)} = \frac{0.20}{0.58} \approx \mathbf{0.3448}$$

Independence Check

Are the events H and A independent?

No, because the conditional probability $P(H | A) \approx 0.4255$ is not equal to the marginal probability $P(H) = 0.58$. If they were independent, these values would be equal.

Descriptive Statistics and Frequency Distribution

Frequency Table Analysis

Real Limits ($L_i, L_s$)Apparent LimitsClass Mark ($X_i$)Frequency ($f_i$)Cumulative Frequency ($N_i$)Relative Frequency ($r_i$)Cumulative Relative Frequency ($R_i$)
(39.95, 49.95)40.0 - 49.944.9547470.2350.235
(49.95, 59.95)50.0 - 59.954.95831300.4150.650
(59.95, 69.95)60.0 - 69.964.95421720.2100.860
(69.95, 79.95)70.0 - 79.974.95282000.1401.000

Key Statistical Measures

Central 90% Range

The central 90% of the population lies between the 5th percentile and the 95th percentile:

Between 42.08 (5th percentile) and 76.38 (95th percentile).

Mean and Standard Deviation

  • Average (Mean): 57.50
  • Standard Deviation: 9.70

Coefficient of Asymmetry (Skewness)

  • Asymmetry Coefficient: 0.39
  • Interpretation: Since the coefficient is positive, the data are slightly skewed to the right (the tail extends to the right).

Bayes' Theorem Application in Medical Diagnosis

A medical team doctor believes, based on clinical history and tests, that a patient may have condition $E_1$, $E_2$, or $E_3$ with the following a priori probabilities:

  • $P(E_1) = 0.40$
  • $P(E_2) = 0.55$
  • $P(E_3) = 0.05$

A new symptom, $S$, appears. The likelihoods of this symptom occurring given each disease are:

  • $P(S | E_1) = 0.80$ (80% of people with $E_1$ have $S$)
  • $P(S | E_2) = 0.30$ (30% of people with $E_2$ have $S$)
  • $P(S | E_3) = 0.90$ (90% of people with $E_3$ have $S$)

In the presence of this new symptom $S$, we must use Bayes' Formula to calculate the altered (posterior) chances that the patient has each of the three diseases, giving the results to three decimal places.

(Note: The calculation steps for $P(E_i | S)$ are required but not provided in the original text. The problem asks for the calculation using Bayes' Formula.)

Related entries: