Statistical Relationships: Scatter, Correlation, Regression
Posted by Anonymous and classified in Mathematics
Written on in English with a size of 127.86 KB
What is a Scatter Diagram?
Definition
A scatter diagram (or scatter plot) is a graphical representation of two variables where each point represents an observation consisting of paired values from two datasets. The horizontal axis (X-axis) represents one variable, and the vertical axis (Y-axis) represents the other.
Construction
Each point (x_i, y_i) is plotted on the graph for the corresponding values of the two variables.
Utility in Correlation Analysis
Scatter diagrams are essential for:
- Visualizing relationships: Helps identify if a linear or non-linear relationship exists.
- Direction of correlation:
- Positive correlation: As X increases, Y increases (points slope upwards).
- Negative correlation: As X increases, Y decreases (points slope downwards).
- No correlation: No discernible pattern; points are scattered randomly.
- Strength of correlation: Closeness of points to an imaginary line indicates strength:
- Strong: Points closely clustered.
- Weak: Points widely scattered.
- Detecting outliers: Points that deviate significantly from the trend can be identified.
Example
If studying height (X) and weight (Y) of individuals, a scatter plot can show whether taller individuals tend to weigh more.
Partial vs. Multiple Correlation
Aspect | Partial Correlation | Multiple Correlation |
---|---|---|
Definition | Measures the degree of association between two variables, keeping other variables constant. | Measures the combined relationship of multiple independent variables with one dependent variable. |
Purpose | Isolates the effect of one variable by removing the influence of others. | Examines how multiple variables together explain the variation in a single variable. |
Example | Correlation between income and education, controlling for age. | Correlation between income (dependent) and both education and experience (independent). |
Symbol | Usually denoted as rxy.z (x and y controlling z) | Denoted as R or R2 |
Application | Used in path analysis, structural equation modeling. | Used in multiple regression, predictive modeling. |
Value Range | -1 to +1 | 0 to +1 |
Defining Regression and Regression Lines
Definition: Regression is a statistical method used to estimate the relationship between a dependent variable (Y) and one or more independent variables (X). In simple linear regression, we study the relationship between two variables.
Regression Lines
There are two regression lines:
- Regression of Y on X (predicting Y from X)
- Regression of X on Y (predicting X from Y)