Mastering Data Visualization and Descriptive Statistics
Classified in Mathematics
Written on in
English with a size of 2.73 KB
Visualizing Discrete and Continuous Data
We can use bar charts to describe data from a discrete variable. In this case, we can perform additional analysis: first, create a frequency table, then draw the charts.
Cumulative Bar Charts
In addition to the standard bar chart, we can draw a cumulative bar chart:
- The standard bar chart allows for comparison among groups.
- The cumulative bar chart displays percentages.
- Note: Cumulative bar charts require an inherent order among outcomes, so they are not suitable for categorical variables.
Histograms for Continuous Variables
Histograms are the equivalent of bar charts for continuous variables:
- To draw a histogram, you must first group the data, similar to building a frequency table.
- Key Difference: In a histogram, the area of each bar represents the absolute frequency, while the height represents the density.
- You can study different groups separately or improve the visualization using a polygon of frequencies by linking the midpoints of the top of each column.
Stem-and-Leaf Plots
Stem-and-leaf plots represent data distribution and provide more detail than histograms. They are best suited for datasets of 150 items or fewer.
- Structure: Two columns, one for stems (tens digits) and one for leaves (unit digits).
Summary of Graph Selection
- Categorical: Bar chart, Pie chart.
- Discrete: Bar chart, Cumulative bar chart.
- Continuous: Stem-and-leaf plot, Histogram, Cumulative frequencies graph, Boxplot.
Numerical Analysis and Descriptive Statistics
While graphs help detect data features, numerical analysis provides precision:
- Central Tendencies: Identify the central points of the data.
- Statistical Dispersion: Describes how stretched or concentrated the data is.
- Skewness: Determines the symmetry of the data.
- Kurtosis: Detects extreme features like peaks or fat tails.
Understanding Central Tendency
Central values represent the most likely points in a dataset. Common statistics include:
- Mode: The most frequent value(s); used for categorical and discrete variables.
- Modal Class: The most frequent class(es); used for grouped data.
- Mean: The arithmetic average; used for numerical variables only.
- Median: The middle point of sorted data; used for numerical variables only.