Optimal Estimators, Dice Posterior & Statistical Problems

Posted by Anonymous and classified in Mathematics

Written on in English with a size of 63.89 KB

Combine Independent Unbiased Estimators

Let d1 and d2 be independent unbiased estimators of θ with variances σ12 and σ22, respectively:

  • E[di] = θ for i = 1,2.
  • Var(di) = σi2.

Any estimator of the form d = λ d1 + (1 - λ) d2 is also unbiased for any constant λ.

The variance (mean square error for an unbiased estimator) is
Var(d) = λ2σ12 + (1 - λ)2σ22.

To minimize Var(d) with respect to λ, differentiate and set to zero:

d/dλ Var(d) = 2λσ12 - 2(1 - λ)σ22 = 0.

Solving gives the optimal weight

λ* = σ22 / (σ12 + σ22).


Question 1: Posterior PMF for a Third Dice Roll

Assume there are five dice with sides {4, 6, 8, 12, 20}. One of these five dice is selected uniformly at random (probability 1/5) and rolled twice. The two observed results are 5 and 9. What is the posterior probability mass function (pmf) for the outcome of a third roll?

Since only the 12-sided and 20-sided dice can produce both results 5 and 9, the posterior probabilities for the chosen die D are:

  • P(D = 12 | 5,9) = 25/34
  • P(D = 20 | 5,9) = 9/34
  • P(D = k | 5,9) = 0 for k in {4,6,8}.

Therefore the pmf of a third roll X3 given the two observations is, for integer x:

  • For x in {1,2,...,12}:
    P(X3 = x | 5,9) = (25/34) * (1/12) + (9/34) * (1/20) = 19/255.
  • For x in {13,14,...,20}:
    P(X3 = x | 5,9) = (9/34) * (1/20) = 9/680.
  • Otherwise: P(X3=x|5,9) = 0.

Question 2: Sum of 1000 Fair Dice Rolls (CLT Expression)

Consider a fair six-sided die with faces 1 through 6. Let S be the sum of 1000 independent tosses. Using the Central Limit Theorem, express the approximate probability P(3000 ≤ S ≤ 4000) as an integral over the standard normal density.

For a single fair die:

  • Mean per roll: μ = 3.5.
  • Variance per roll: σ2 = Var(X) = 35/12.

For S = sum of 1000 iid rolls: E[S] = 1000μ = 3500 and Var(S) = 1000 * (35/12). Let σS = sqrt(1000 * 35/12).

By the CLT, approximately

P(3000 ≤ S ≤ 4000) ≈ integral from a to b of phi(z) dz, where

  • a = (3000 - 3500) / σS,
  • b = (4000 - 3500) / σS,
  • and phi(z) is the standard normal density phi(z) = (1 / sqrt(2π)) e-z2/2.

Equivalently:

P(3000 ≤ S ≤ 4000) ≈ ∫(3000-3500)/σS(4000-3500)/σS (1 / √(2π)) e-z2/2 dz.

ADZWJ7ymOq+HAAAAAElFTkSuQmCC

xFouZXT9EIIIYQQwmdkZlQIIYQQQviMFKNCCCGEEMJnfgGC2wRoo728lAAAAABJRU5ErkJggg==


Question 3: Convex Combination of Two Unbiased Estimators

Let T1 and T2 be two independent unbiased estimators of the mean μ (i.e., E[T1] = E[T2] = μ).

  1. Show that for any 0 ≤ λ ≤ 1, the estimator T = λ T1 + (1 - λ) T2 is unbiased.
  2. Assume Var(T1) = σ12 and Var(T2) = σ22. Find the value of λ that minimizes the mean square error (equivalently, the variance) of T.

Solution sketch:

  • Unbiasedness: E[T] = λ E[T1] + (1-λ) E[T2] = λμ + (1-λ)μ = μ.
  • Variance: Var(T) = λ2σ12 + (1-λ)2σ22. Minimize with respect to λ to obtain
    λ = σ22 / (σ12 + σ22).

Question 4: Hypothesis Test for a Fair Coin

A null hypothesis tests whether a coin is fair: H0: ψ = 1/2. The rejection rule is: in 5 independent flips, reject H0 if at least 4 outcomes are of the same type (i.e., 4 or 5 heads, or 4 or 5 tails).

  1. Significance level (α):
    Under H0 (p = 1/2),
    α = 2 * [C(5,4) (1/2)5 + C(5,5) (1/2)5] = 2 * (5/32 + 1/32) = 12/32 = 3/8 ≈ 0.375.
  2. Power when HA: ψ = 2/3:
    Let p = P(heads) = 2/3. The power is P(reject H0 | p = 2/3) = P(4 heads) + P(5 heads) + P(4 tails) + P(5 tails).
    Explicitly:
    P = 5 (2/3)4 (1/3) + (2/3)5 + 5 (1/3)4 (2/3) + (1/3)5.
    This simplifies to 123/243 = 41/81 ≈ 0.50617.

Question 5: Location PDF f(x) = 0.5 e^{-|x - θ|}

Assume the probability density function is the Laplace (double exponential) location family:

f(x) = 0.5 e^{-|x - θ|}, for x in R and parameter θ (location).

  1. Mean: Because the distribution is symmetric about θ, E[X] = θ.
  2. MSE for an estimator given N iid samples:
    The maximum likelihood estimator (and the minimum-mean-absolute-deviation estimator) for the location parameter is the sample median. For symmetric distributions, the sample median is an unbiased estimator of θ (for odd N) and the MSE is Var(median) (since bias = 0). A closed-form variance depends on N and the underlying density; in general
    MSE(median) = Var(median) = (1 / (4 N f(θ)2)) + o(1/N) asymptotically, where f(θ) is the pdf evaluated at the true location (here f(θ) = 0.5).
  3. N = 3 and samples 1, 3, 5:
    The sample median is 3, so the MLE is θMLE = 3. The MSE of this estimator as a random quantity depends on the true θ, but the point estimate is 3.
  4. General MLE for N samples:
    The MLE of θ given iid Laplace samples is any median of the sample (for odd N the unique sample median). For even N, any value between the two middle order statistics maximizes the likelihood.

Question 6: Combining Two Measurements to Estimate Building Height

We measure a building height parameter β using two measurements at distances with known functions a(ρ). For simplicity denote a1 = a(ρ1) and a2 = a(ρ2). Observations are:

  • y1 = a1 β + ε1,
  • y2 = a2 β + ε2,

where ε1, ε2 are independent N(0,1) errors.

An engineer forms the weighted sum c1 y1 + c2 y2 to estimate β. We seek weights that minimize MSE.

  1. Minimize MSE under unbiasedness (BLUE):
    For unbiasedness require c1 a1 + c2 a2 = 1. Minimize Var(c1 y1 + c2 y2) = c12 + c22 subject to the constraint. The solution (Gauss–Markov / least squares) is:
    c1 = a1 / (a12 + a22),
    c2 = a2 / (a12 + a22).
    This estimator equals (a1 y1 + a2 y2) / (a12 + a22).
  2. Maximum Likelihood Estimate (MLE):
    Because the errors are Gaussian with equal known variance and independent, the MLE for β is identical to the BLUE above. Thus the MLE uses the same weights c1, c2 as given.

Related entries: