Fundamental Data Mining Tasks and Algorithmic Approaches
Classified in Mathematics
Written on in
English with a size of 2.73 KB
Fundamental Data Mining Tasks and Concepts
Prediction: Forecasting Time Series Data
Prediction involves estimating the future value of a variable that is subject to random changes over time. This technique is strictly applied to time series, which are datasets whose domain is time.
Regression Analysis: Modeling Relationships
Regression is a generalization of classification (when the domain involves continuous classes) and prediction. The resulting model (classification or prediction) depends on the significance of the dependent and independent variables.
The goal is to find a mathematical or statistical model that properly relates the dependent variable with the independent variables. Geometrically, regression seeks a function that passes as close as possible (on average) to the individuals (data points) that constitute the sample.
Association Rule Mining (Partnership)
Association Rule Mining addresses issues such as Market Basket Analysis to obtain customer buying trends. It aims to find possible relationships between two seemingly independent events.
Core Tasks of Data Mining Algorithms
Estimating Population Parameters
Estimation involves determining population parameters based on the available sample (data matrix, X). These parameters represent information that can be very useful, especially in market research.
Example: Estimating the level of demand for laptops in the city of Merida in 2010.
Grouping (Clustering)
Grouping consists of dividing a sample into two or more groups (clusters). The objective is to minimize the variance within groups and maximize the variance between groups.
This means:
- Individuals who are part of a group should be as similar as possible. Geometrically, these individuals (points in p-dimensional space) should be as close as possible.
- Individuals from different groups should be as far apart as possible.
Each group becomes a class. Note that in such tasks, models are typically not built or used.
Classification (Rating)
Classification involves developing or constructing a model that assigns a class to an individual according to their position in space (based on their values in each variable).
Key components of the model:
- The dependent variable is the class label.
- The independent variables relate to individual characteristics.
The parameters of this model depend on the sample used. The sample contains a set of n individuals, each of which belongs to one of the existing C classes.