Fundamentals of Statistical Measurement and Data Analysis

Classified in Mathematics

Written on in English with a size of 1.57 MB

Chapter 1: Understanding Variables

Types of Variables

  • Categorical: Smoker (current, former, no)
  • Ordinal: Non, light, moderate, heavy smoker (ordered categories)
  • Quantitative: BMI, Age, Weight (numerical measurements)

Key Definitions

  • Observation: Measurements are made (individual or aggregate).
  • Variable: The generic characteristic we measure (e.g., age).
  • Value: A realized measurement (e.g., 27).

Chapter 2: Statistical Studies

Surveys: Census and Sampling

  • Goal: Describe population characteristics.
  • Census: Attempts to reach the entire population (costly, time-consuming).
  • Sampling: Uses a sample of the population (allows for inferences, saves time and money).
  • Simple Random Sampling: Based on probability. AWKG0fPryDS0AAAAAElFTkSuQmCC
  • Issues with Sampling: Under-coverage, volunteer bias, and nonresponse bias.

wWOImpw0Iyn1QAAAABJRU5ErkJggg== z8XRrSrMYjonwAAAABJRU5ErkJggg==

Comparative Studies: Experimental and Non-experimental

These studies determine the relationship between explanatory and response variables (e.g., a study on whether weight gain causes hypertension).

  • Experimental: Subjects are assigned according to the explanatory variable (exposed/unexposed).
  • Non-experimental: Subjects are not assigned; they are merely classified as exposed/nonexposed.

gfxiVbSSsC2zcAjBqotGKhERIKwh0pEJAgDlYhIEAYqEZEgDFQiIkEYqEREgjBQiYgEYaASEQnCQCUiEuQnRzMuHJUbl90AAAAASUVORK5CYII=

Blinding Techniques

  • Single Blinding: Subjects are unaware of the specific treatment they receive.
  • Double Blinding: Subjects and investigators are unaware of the treatment assignments.
  • Triple Blinding: Subjects, investigators, and statisticians are unaware of the treatment assignments.

Chapter 3: Data Visualization and Distribution Shapes

Visualizing Data

Stem Plot: ROqIkWNA8U38H0NscuR0tL2NAAAAAElFTkSuQmCC Rotated: cAAAAASUVORK5CYII=

Distributional Shapes

B8y+g1+6vX8QwAAAABJRU5ErkJggg== 4KPgSAIl9clD8dr2ccffURaahpLbrzBv0gQhHFOhKMgCEIAF9YJKAiCcJUT4SgIghCACEdBEIQARDgKgiAEIMJREAQhgP8f9OlX9Rve5DsAAAAASUVORK5CYII= B6LB3kSUUJ2VAAAAAElFTkSuQmCC

  • Modality: The number of peaks in the distribution.
  • Kurtosis: The steepness or peakedness of the distribution.

Frequency Tables and Charts

Frequency Table

ogvSE9kyK5cAAAAASUVORK5CYII= lhydtGzaJRKL74LEPIiKR6MESg4hIJLonYhARiUT3RAwiIpHonohBRCQS3RMxiIhEonvyf9t1n4jHT8RFAAAAAElFTkSuQmCC pdetIAjCNUYEs4IgXJcCt1fJkEgCPZ6vXz5GTadpaz3NaLyRtMxskqbf8ScIgnBdE8GsIAjXISURMcmkZxeSEmcg5MpNV65xToZGXUiCYilfUkperigxEATh34u4AUwQBEEQBEG4ZonMrCAIgiAIgnDNEsGsIAiCIAiCcM0SwawgCIIgCIJwzRLBrCAIgiAIgnDNEsGsIAiCIAiCcM36fz0pVbhwtrLdAAAAAElFTkSuQmCC

  • Histogram: Used for displaying quantitative measurements.
  • Bar Chart: Used for displaying categorical measurements.

Chapter 4: Measures of Central Location and Spread

Descriptive Measures

  • Central Location: Mean, Median, Mode
  • Spread: Range, IQR, Variance, and Standard Deviation

Notation

  • n = Sample size
  • N = Population size
  • X = Variable (e.g., ages of subjects)
  • xi = The value of the individual i for X
  • Σ = Sum of all values (Capital Sigma)

Measures of Central Location

Mean: The Arithmetic Average ("x-bar")

fffT3x8PNXV1dTX19sWH7VEM7UgCIIwbCw5qSsrK7GzsyM4ONiaNWsoWFad6urq6jM1ytIXPZRN4cNJBGNBEARBGGGimVoQBEEQRpgIxoIgCIIwwkQwFgRBEIQRJoKxIAiCIIwwEYwFQRAEYYT9P1yDQZsXbr53AAAAAElFTkSuQmCC

The mean is the balancing point of a distribution.

Median: The Middle Value

The median is more robust in the face of outliers or errors.

Relationship Between Mean and Median (Skew)

  • Symmetric Data Set: Mean = Median = Mode
  • Symmetrical Distribution: Mean = Median
  • Positive Skew (Right Skew): Mean > Median
  • Negative Skew (Left Skew): Mean < Median

bUAlEolEIpFI9G0RL82IRCKRSCQSfUNiQCUSiUQikUj0DYkBlUgkEolEItE39H9QHuAnEFG6lQAAAABJRU5ErkJggg==

Measures of Spread

Range

Range = Maximum value - Minimum value. (Note: Sample range tends to underestimate population range.)

Quartiles and IQR

wG3N6dphEeBtQAAAABJRU5ErkJggg== ZDql+B93bqAAAAABJRU5ErkJggg==

  • Q1 (First Quartile): Cuts off the bottom quarter of data; it is the median of the lower half of the data set.
  • Q3 (Third Quartile): Cuts off the top quarter of data; it is the median of the upper half.
  • IQR (Interquartile Range): Q3 - Q1. Covers the middle 50% of the distribution.

**When n is odd, include the median in both halves of the data set when calculating quartiles.**

Boxplot

g3Ju+sAAAAAElFTkSuQmCC +i974BpegwgAAAAASUVORK5CYII=

Standard Deviation

The most common descriptive measure of spread.

GVaxHWpO6xEAAAAASUVORK5CYII= W4CsAAAAABJRU5ErkJggg== 3d1S9kfC9yiAAAAAElFTkSuQmCC vv+QzklhUKhUCgUCsXv5f8AGDvSrgyKXKMAAAAASUVORK5CYII= J4fRn8mauaoAAAAASUVORK5CYII=

Rules and Reporting

Chebychev's Rule (Applicable to all distributions)

At least 75% of the values fall within the range μ ± 2σ.

Rounding

wU3VhSp+7n3pAAAAABJRU5ErkJggg==

Carry at least four significant digits during calculations.

Choosing Summary Statistics

Always report measures of central location, spread, and sample size.

  • For symmetrical, mound-shaped distributions: Report Mean and Standard Deviation.
  • For skewed or odd-shaped distributions: Report 5-point summaries (Median and IQR).

Related entries: