Statistical Inference & Hypothesis Testing Concepts
Classified in Mathematics
Written on in English with a size of 3.52 KB
Parametric Inference Fundamentals
The probability distribution of the population under study is known, except for a finite number of parameters. Its goal is to estimate those parameters. Examples include the T-test and ANOVA.
Non-Parametric Inference Basics
The distribution of the population is not known. It is used to test the assumptions of parametric methods, for example, to check if the population distribution is normal.
What is a Statistic?
A random variable function of the sample that does not depend on the unknown parameter.
Understanding Estimators
A statistic whose values are acceptable for estimating an unknown parameter.
Unbiasedness in Estimation
We do not allow systematic overestimation or underestimation of the parameter, which would result in a bias of zero in the estimation.
Efficiency of Estimators
The risk of deviating too much from the actual parameter value should be low.
Consistency in Statistical Estimation
If we increase the sample size, the corresponding estimation should become more exact, as we have more information.
Defining Confidence Intervals
To provide an appropriate approximation of the parameter under study, we use a confidence interval.
Hypothesis Testing Explained
It attempts to decide if a specific hypothesis about the distribution under study is confirmed or rejected.
Null Hypothesis (H0)
H0, the hypothesis being tested. It is named null because H0 represents the hypothesis that we will consider true unless the sample data clearly show the opposite.
Alternative Hypothesis (H1)
If H1 is accepted, it is because the sample data clearly indicate that H0 is not true; thus, H1 is the opposite of H0.
Type I Error (Alpha Level)
Also known as the significance level (α), this occurs when we reject H0 when it is actually true.
Type II Error (Beta Level)
This occurs when we do not reject H0 when it is actually false. It is represented by beta (β). The statistical power of a test is calculated as 1 - β.
Defining the Critical Region
The region in which we reject H0.
ANOVA: Main Objective
To contrast if there are differences between the different factor levels.
ANOVA: Core Problem Addressed
Given n elements differing only in one factor, a continuous feature (the response variable) is observed, which randomly varies from one element to another. We want to know if there is a relationship between the mean value of this feature and the factor.
Regression Analysis Purpose
To estimate an unknown model and check if it is appropriate. This often involves assumptions like normality, homoscedasticity, randomness, and linearity.
ANOVA vs. Kruskal-Wallis Test
Parametric inference (ANOVA) assumes that the obtained means are random and normally distributed, and that an unknown parameter exists. The Kruskal-Wallis test does not make these assumptions.
Understanding Residuals
The difference between the observed and predicted values in a statistical model.
SC-Inter (SSTR): Between-Treatment Variability
Represents the variability between the treatment means and the global mean.
SC-Intra (SSE): Within-Treatment Variability
Represents the part of the variability not explained by the treatment.