Optimal Estimators, Dice Posterior & Statistical Problems
Posted by Anonymous and classified in Mathematics
Written on in
English with a size of 63.89 KB
Combine Independent Unbiased Estimators
Let d1 and d2 be independent unbiased estimators of θ with variances σ12 and σ22, respectively:
- E[di] = θ for i = 1,2.
- Var(di) = σi2.
Any estimator of the form d = λ d1 + (1 - λ) d2 is also unbiased for any constant λ.
The variance (mean square error for an unbiased estimator) is
Var(d) = λ2σ12 + (1 - λ)2σ22.
To minimize Var(d) with respect to λ, differentiate and set to zero:
d/dλ Var(d) = 2λσ12 - 2(1 - λ)σ22 = 0.
Solving gives the optimal weight
λ* = σ22 / (σ12 + σ22).
Question 1: Posterior PMF for a Third Dice Roll
Assume there are five dice with sides {4, 6, 8, 12, 20}. One of these five dice is selected uniformly at random (probability 1/5) and rolled twice. The two observed results are 5 and 9. What is the posterior probability mass function (pmf) for the outcome of a third roll?
Since only the 12-sided and 20-sided dice can produce both results 5 and 9, the posterior probabilities for the chosen die D are:
- P(D = 12 | 5,9) = 25/34
- P(D = 20 | 5,9) = 9/34
- P(D = k | 5,9) = 0 for k in {4,6,8}.
Therefore the pmf of a third roll X3 given the two observations is, for integer x:
- For x in {1,2,...,12}:
P(X3 = x | 5,9) = (25/34) * (1/12) + (9/34) * (1/20) = 19/255. - For x in {13,14,...,20}:
P(X3 = x | 5,9) = (9/34) * (1/20) = 9/680. - Otherwise: P(X3=x|5,9) = 0.
Question 2: Sum of 1000 Fair Dice Rolls (CLT Expression)
Consider a fair six-sided die with faces 1 through 6. Let S be the sum of 1000 independent tosses. Using the Central Limit Theorem, express the approximate probability P(3000 ≤ S ≤ 4000) as an integral over the standard normal density.
For a single fair die:
- Mean per roll: μ = 3.5.
- Variance per roll: σ2 = Var(X) = 35/12.
For S = sum of 1000 iid rolls: E[S] = 1000μ = 3500 and Var(S) = 1000 * (35/12). Let σS = sqrt(1000 * 35/12).
By the CLT, approximately
P(3000 ≤ S ≤ 4000) ≈ integral from a to b of phi(z) dz, where
- a = (3000 - 3500) / σS,
- b = (4000 - 3500) / σS,
- and phi(z) is the standard normal density phi(z) = (1 / sqrt(2π)) e-z2/2.
Equivalently:
P(3000 ≤ S ≤ 4000) ≈ ∫(3000-3500)/σS(4000-3500)/σS (1 / √(2π)) e-z2/2 dz.
Question 3: Convex Combination of Two Unbiased Estimators
Let T1 and T2 be two independent unbiased estimators of the mean μ (i.e., E[T1] = E[T2] = μ).
- Show that for any 0 ≤ λ ≤ 1, the estimator T = λ T1 + (1 - λ) T2 is unbiased.
- Assume Var(T1) = σ12 and Var(T2) = σ22. Find the value of λ that minimizes the mean square error (equivalently, the variance) of T.
Solution sketch:
- Unbiasedness: E[T] = λ E[T1] + (1-λ) E[T2] = λμ + (1-λ)μ = μ.
- Variance: Var(T) = λ2σ12 + (1-λ)2σ22. Minimize with respect to λ to obtain
λ = σ22 / (σ12 + σ22).
Question 4: Hypothesis Test for a Fair Coin
A null hypothesis tests whether a coin is fair: H0: ψ = 1/2. The rejection rule is: in 5 independent flips, reject H0 if at least 4 outcomes are of the same type (i.e., 4 or 5 heads, or 4 or 5 tails).
- Significance level (α):
Under H0 (p = 1/2),
α = 2 * [C(5,4) (1/2)5 + C(5,5) (1/2)5] = 2 * (5/32 + 1/32) = 12/32 = 3/8 ≈ 0.375. - Power when HA: ψ = 2/3:
Let p = P(heads) = 2/3. The power is P(reject H0 | p = 2/3) = P(4 heads) + P(5 heads) + P(4 tails) + P(5 tails).
Explicitly:
P = 5 (2/3)4 (1/3) + (2/3)5 + 5 (1/3)4 (2/3) + (1/3)5.
This simplifies to 123/243 = 41/81 ≈ 0.50617.
Question 5: Location PDF f(x) = 0.5 e^{-|x - θ|}
Assume the probability density function is the Laplace (double exponential) location family:
f(x) = 0.5 e^{-|x - θ|}, for x in R and parameter θ (location).
- Mean: Because the distribution is symmetric about θ, E[X] = θ.
- MSE for an estimator given N iid samples:
The maximum likelihood estimator (and the minimum-mean-absolute-deviation estimator) for the location parameter is the sample median. For symmetric distributions, the sample median is an unbiased estimator of θ (for odd N) and the MSE is Var(median) (since bias = 0). A closed-form variance depends on N and the underlying density; in general
MSE(median) = Var(median) = (1 / (4 N f(θ)2)) + o(1/N) asymptotically, where f(θ) is the pdf evaluated at the true location (here f(θ) = 0.5). - N = 3 and samples 1, 3, 5:
The sample median is 3, so the MLE is θMLE = 3. The MSE of this estimator as a random quantity depends on the true θ, but the point estimate is 3. - General MLE for N samples:
The MLE of θ given iid Laplace samples is any median of the sample (for odd N the unique sample median). For even N, any value between the two middle order statistics maximizes the likelihood.
Question 6: Combining Two Measurements to Estimate Building Height
We measure a building height parameter β using two measurements at distances with known functions a(ρ). For simplicity denote a1 = a(ρ1) and a2 = a(ρ2). Observations are:
- y1 = a1 β + ε1,
- y2 = a2 β + ε2,
where ε1, ε2 are independent N(0,1) errors.
An engineer forms the weighted sum c1 y1 + c2 y2 to estimate β. We seek weights that minimize MSE.
- Minimize MSE under unbiasedness (BLUE):
For unbiasedness require c1 a1 + c2 a2 = 1. Minimize Var(c1 y1 + c2 y2) = c12 + c22 subject to the constraint. The solution (Gauss–Markov / least squares) is:
c1 = a1 / (a12 + a22),
c2 = a2 / (a12 + a22).
This estimator equals (a1 y1 + a2 y2) / (a12 + a22). - Maximum Likelihood Estimate (MLE):
Because the errors are Gaussian with equal known variance and independent, the MLE for β is identical to the BLUE above. Thus the MLE uses the same weights c1, c2 as given.