Statistical Relationships: Scatter, Correlation, Regression

Posted by Anonymous and classified in Mathematics

Written on in English with a size of 127.86 KB

What is a Scatter Diagram?

Definition

A scatter diagram (or scatter plot) is a graphical representation of two variables where each point represents an observation consisting of paired values from two datasets. The horizontal axis (X-axis) represents one variable, and the vertical axis (Y-axis) represents the other.

Construction

Each point (x_i, y_i) is plotted on the graph for the corresponding values of the two variables.

Utility in Correlation Analysis

Scatter diagrams are essential for:

  • Visualizing relationships: Helps identify if a linear or non-linear relationship exists.
  • Direction of correlation:
    • Positive correlation: As X increases, Y increases (points slope upwards).
    • Negative correlation: As X increases, Y decreases (points slope downwards).
    • No correlation: No discernible pattern; points are scattered randomly.
  • Strength of correlation: Closeness of points to an imaginary line indicates strength:
    • Strong: Points closely clustered.
    • Weak: Points widely scattered.
  • Detecting outliers: Points that deviate significantly from the trend can be identified.

Example

If studying height (X) and weight (Y) of individuals, a scatter plot can show whether taller individuals tend to weigh more.

Partial vs. Multiple Correlation

AspectPartial CorrelationMultiple Correlation
DefinitionMeasures the degree of association between two variables, keeping other variables constant.Measures the combined relationship of multiple independent variables with one dependent variable.
PurposeIsolates the effect of one variable by removing the influence of others.Examines how multiple variables together explain the variation in a single variable.
ExampleCorrelation between income and education, controlling for age.Correlation between income (dependent) and both education and experience (independent).
SymbolUsually denoted as rxy.z (x and y controlling z)Denoted as R or R2
ApplicationUsed in path analysis, structural equation modeling.Used in multiple regression, predictive modeling.
Value Range-1 to +10 to +1

Defining Regression and Regression Lines

Definition: Regression is a statistical method used to estimate the relationship between a dependent variable (Y) and one or more independent variables (X). In simple linear regression, we study the relationship between two variables.

Regression Lines

There are two regression lines:

  1. Regression of Y on X (predicting Y from X)
  2. Regression of X on Y (predicting X from Y)

1. Regression Equation of Y on X

0BHdOb11aAoNsOunJNnfP8a+0NHiBmDj59iVUq2hTAjXJrcP6ZGyKXKnmwptX6bRPJjBvYFFlSk8YXv7GL1dh+ewj7BpgrPHvdPFV7WD1vtSpO7W07TMX1H+rxRo8SShfQxS30xJKlHn9FBngs5dHZR1t2PlLB3H3QTmOUnaCBRsVZzvFJ4VFwhFYfOUorB5SlHYPKUobJ5SFDYvgb8BL8A5JpI27K8AAAAASUVORK5CYII=

D7QBFqsBE3qSAAAAAElFTkSuQmCC

68qCMsAAAAGSURBVAMACINE3IvV8qYAAAAASUVORK5CYII=

Related entries: