Fundamentals of Statistical Measurement and Data Analysis
Classified in Mathematics
Written on in
English with a size of 1.57 MB
Chapter 1: Understanding Variables
Types of Variables
- Categorical: Smoker (current, former, no)
- Ordinal: Non, light, moderate, heavy smoker (ordered categories)
- Quantitative: BMI, Age, Weight (numerical measurements)
Key Definitions
- Observation: Measurements are made (individual or aggregate).
- Variable: The generic characteristic we measure (e.g., age).
- Value: A realized measurement (e.g., 27).
Chapter 2: Statistical Studies
Surveys: Census and Sampling
- Goal: Describe population characteristics.
- Census: Attempts to reach the entire population (costly, time-consuming).
- Sampling: Uses a sample of the population (allows for inferences, saves time and money).
- Simple Random Sampling: Based on probability.
- Issues with Sampling: Under-coverage, volunteer bias, and nonresponse bias.
Comparative Studies: Experimental and Non-experimental
These studies determine the relationship between explanatory and response variables (e.g., a study on whether weight gain causes hypertension).
- Experimental: Subjects are assigned according to the explanatory variable (exposed/unexposed).
- Non-experimental: Subjects are not assigned; they are merely classified as exposed/nonexposed.
Blinding Techniques
- Single Blinding: Subjects are unaware of the specific treatment they receive.
- Double Blinding: Subjects and investigators are unaware of the treatment assignments.
- Triple Blinding: Subjects, investigators, and statisticians are unaware of the treatment assignments.
Chapter 3: Data Visualization and Distribution Shapes
Visualizing Data
Stem Plot:
Rotated:
Distributional Shapes
- Modality: The number of peaks in the distribution.
- Kurtosis: The steepness or peakedness of the distribution.
Frequency Tables and Charts
Frequency Table
- Histogram: Used for displaying quantitative measurements.
- Bar Chart: Used for displaying categorical measurements.
Chapter 4: Measures of Central Location and Spread
Descriptive Measures
- Central Location: Mean, Median, Mode
- Spread: Range, IQR, Variance, and Standard Deviation
Notation
- n = Sample size
- N = Population size
- X = Variable (e.g., ages of subjects)
- xi = The value of the individual i for X
- Σ = Sum of all values (Capital Sigma)
Measures of Central Location
Mean: The Arithmetic Average ("x-bar")
The mean is the balancing point of a distribution.
Median: The Middle Value
The median is more robust in the face of outliers or errors.
Relationship Between Mean and Median (Skew)
- Symmetric Data Set: Mean = Median = Mode
- Symmetrical Distribution: Mean = Median
- Positive Skew (Right Skew): Mean > Median
- Negative Skew (Left Skew): Mean < Median
Measures of Spread
Range
Range = Maximum value - Minimum value. (Note: Sample range tends to underestimate population range.)
Quartiles and IQR
- Q1 (First Quartile): Cuts off the bottom quarter of data; it is the median of the lower half of the data set.
- Q3 (Third Quartile): Cuts off the top quarter of data; it is the median of the upper half.
- IQR (Interquartile Range): Q3 - Q1. Covers the middle 50% of the distribution.
**When n is odd, include the median in both halves of the data set when calculating quartiles.**
Boxplot
Standard Deviation
The most common descriptive measure of spread.
Rules and Reporting
Chebychev's Rule (Applicable to all distributions)
At least 75% of the values fall within the range μ ± 2σ.
Rounding
Carry at least four significant digits during calculations.
Choosing Summary Statistics
Always report measures of central location, spread, and sample size.
- For symmetrical, mound-shaped distributions: Report Mean and Standard Deviation.
- For skewed or odd-shaped distributions: Report 5-point summaries (Median and IQR).