Understanding Statistical Concepts and Hypothesis Testing
Classified in Mathematics
Written at on English with a size of 9.16 KB.
Key Statistical Concepts
Variables
Categorical (Qualitative) Variables: Represent characteristics or qualities. Examples include race and gender.
Quantitative Variables: Represent numerical values that differ in magnitude.
Scales of Measurement
- Nominal: Unordered categories.
- Ordinal: Ordered categories.
- Interval: Equal intervals between values.
Data Visualization
Frequency: Displays the possible values of a variable and the number of times each occurs.
Histogram: A bar graph of frequencies or percentages. Shapes can be bell-shaped, skewed, or bimodal.
Box Plot: Displays the distribution of data, including median, quartiles, and outliers.
Scatter Plot: Shows the relationship between two variables.
Distribution
- For symmetric distributions, the mean equals the median.
- For skewed distributions, the mean is pulled in the direction of the skew. The median is generally preferred for skewed distributions.
Standard Deviation
The standard deviation represents the "typical" distance from the mean. It is calculated as s = √s2.
Empirical Rule:
- Approximately 68% of data falls within 1 standard deviation of the mean.
- Approximately 95% of data falls within 2 standard deviations of the mean.
Hypothesis Testing
Hypothesis testing involves creating models and testing hypotheses with data to see if they are consistent with the data.
- Null Hypothesis: There is no effect.
- Alternative Hypothesis: There is an effect.
- Type 1 Error: Rejecting the null hypothesis when it is true.
- Type 2 Error: Retaining the null hypothesis when it is false.
Significance Level (α)
The significance level (α) is the proportion of times one can expect to reject the null hypothesis when it's true in repeated, randomly drawn samples of the same size from the population.
P-value and Confidence Interval
P-value: The smallest significance level at which the null hypothesis can be rejected.
Confidence Interval: A 95% confidence interval is calculated as y ± 1.96(s/√n). If the hypothesized value falls outside this range, reject the null hypothesis.
Correlation and Regression
Standard Error of the Mean
Calculated as s/√n.
Correlation
Measures how two variables are related.
- The standard error of the sample mean estimates how far the sample mean is likely to be from the population mean.
- The standard deviation of the sample indicates the degree to which individuals within the sample differ from the sample mean.
SD is greater than or equal to 0. SD increases with more variation around the mean. Outliers affect s.
Covariance
If both variables deviate from the mean by the same amount, they are likely related. The standardized version of covariance is the Pearson Correlation Coefficient (r), which varies between -1 and +1.
- ±0.1 = small effect
- ±0.3 = medium effect
- ±0.5 = large effect
Coefficient of Determination (r2): The proportion of variance in one variable shared by the other. Used for regression.
Regression
Regression is a method of predicting the value of one variable from another. It involves a hypothetical model of the relationship between variables.
(y = outcome, DV; x = predictor, IV; b1 = strength of relationship; b0 = intercept; e = error). Regression is used to describe an overall theoretical linear relationship between two variables.
Example: Population Model:
With Variables: Exam performance = b0 + b1(time revising) + e
Prediction Model:
Interpreting the Slope: For each additional hour spent revising, exam performance is predicted to increase by about 0.57%.
Interpreting the Intercept: If a student spent 0 hours revising (so X = 0), they are predicted to score about 45% on the exam.
For example, a student who spent 15 hours revising is predicted to score 53.87%.