Essential Statistical Concepts and Probability Methods
Posted by Anonymous and classified in Mathematics
Written on in
English with a size of 947.38 KB
Common Statistical Biases
- Sampling bias: The sample was not representative of the population.
- Non-response bias: Only 24% returned surveys.
Sampling Techniques
- Simple Random Sampling (SRS): 1) Every member of the population has the same chance of being included (representative). 2) Members are chosen independently.
- Random Cluster Sampling: 1) Divide into smaller geographical sectors. 2) Take an SRS of sectors. 3) Count all samples in sectors and scale appropriately.
- Stratified Random Sampling: 1) Divide population into groups based on criteria like age or income. 2) Perform an SRS of each group and scale appropriately.
Data Variables and Distributions
- Variable Types: Categorical and Numeric (discrete and continuous).
- Relative frequency: Count / sample size.
- Skewed to right: The tail is longer on the right.
- Bimodal distribution: Known as "twin towers."
- Robustness: Data is relatively unaffected by changes in a small portion of the dataset. The median and IQR are robust; the mean and range are not.
- IQR: Q3 - Q1.
- Outliers: Lower fence = Q1 - (1.5 x IQR).
Probability Rules
- Complement: Probability of an event not happening is 1 - the probability that it does happen.
- Addition rule: Pr(E1 or E2) = Pr(E1) + Pr(E2) - Pr(E1 and E1 and E2).
- Disjoint: Means no overlap.
- Conditional probability: Pr(E2|E1) = Pr(E2 and E1) / Pr(E1).
- Multiplication rule: Pr(E1 and E2) = Pr(E1) x Pr(E2|E1).
- Independence: Events E1 and E2 are independent if Pr(E1 and E2) = Pr(E1) x Pr(E2). Two events are independent if knowing that one event occurred does not change the probability of the other event occurring.
Statistical Analysis and Binomial Examples
Note: There is not enough information in a boxplot to determine standard deviation (SD).
Example: Assume 14% of men are colorblind, SRS for 10 men:
- Exactly one of 10 men is colorblind: 10 * (0.14^1) * (0.86^9)
- One or two of 10 men are colorblind: P(x=1 or x=2) = P(x=1) + P(x=2). P(x=2) = 10 choose 2 * (0.14^2) * (0.86^8) = 0.2639. Total = 0.3603 + 0.2639.
- One or more of 10 men are colorblind: P(x>1) = 1 - P(x=0). P(x=0) = 1 - (0.86^10) * (0.14^0).
Standard Error and CLT
Formula: SE = s / sqrt(n)
Impact of Distribution: Using z-scores to calculate probability relies on the assumption that the distribution being observed is normally distributed. However, when looking at the distribution of the sampling mean rather than individual sample distributions, the Central Limit Theorem (CLT) applies. As sample size increases, the sampling mean approaches a normal distribution regardless of the population distribution. In this case, a sample size of n=100 is large enough.
Confidence Intervals: The 95% confidence interval shows that if many samples were taken and a 95% CI was calculated for each, then around 95% of these intervals should contain the true population mean. Therefore, individual measurements in one SRS do not confirm or deny the correctness of the interval.