Machine Learning Algorithms: Comparison and Best Practices

Classified in Mathematics

Written on June 9, 2026 in English with a size of 5.26 KB

Supervised Classification

Type: Classification + Regression
Scaling: NO
Outliers: Robust
Categorical Variables: YES
Idea: Many trees (BAGGING) → majority vote → final prediction
Advantages: Reduces overfit, stable, feature importance, handles missing data
Disadvantages: Slow, uninterpretable, many hyperparameters
Metrics: Accuracy, Feature Importance, OOB error

Type: Regression (continuous Y only)
Scaling: YES (recommended)
Outliers: NOT robust (outliers distort line)
Categorical Variables: NO (encode first)
Idea: y = b0 + b1·x1 + b2·x2 + ... → predicts a continuous number
Advantages: Simple, fast, interpretable, coefficients show feature impact
Disadvantages: Assumes linearity, fails complex patterns, sensitive to outliers
Metrics: MAE, MSE, RMSE, R²

Type: Classification + Regression
Scaling: YES (distance-based — MUST scale)
Outliers: NOT robust
Categorical Variables: NO
Idea: Classification: majority vote of K neighbors; Regression: average of K neighbors
Advantages: Simple, no training phase, works with small data
Disadvantages: Slow on large data, sensitive to outliers
Metrics: Accuracy, F1 (class); MAE, MSE, R² (regression)
Note: K too low = overfit/noisy; K too high = underfit. Best default K ≈ 5.

Type: Clustering (NO target variable)
Scaling: YES (distance-based)
Outliers: NOT robust (shift centroids)
Categorical Variables: NO
Idea: K centroids → assign each point to nearest → update centroids → repeat until stable
Inertia: Sum squared distances to centroid (lower = more compact); used in Elbow method
Elbow Method: Plot inertia vs K → pick K where curve bends
Advantages: Fast, simple, scalable
Disadvantages: Must set K, assumes spherical clusters, random initialization leads to different results
Metrics: Inertia, Silhouette score

Type: Clustering (NO target variable)
Scaling: YES (distance-based)
Outliers: Depends on linkage
Categorical Variables: NO
Idea: Agglomerative: start with N clusters → merge closest pairs → dendrogram shows history
Dendrogram: Tree diagram; cut horizontally → vertical lines crossed = number of clusters
Advantages: No K needed in advance, deterministic, shows full merge history
Linkage Types:
- Single: Nearest point, good for outlier detection
- Complete: Farthest point, avoids chains
- Average: Centroid distance, robust to outliers
- Ward: Minimizes within-cluster variance, best general choice
Disadvantages: Slow on large data, linkage choice matters significantly
Metrics: Dendrogram, Silhouette score, Elbow (distortion)

Note: High Confidence + Lift ≈ 1 means the consequent is equally common with or without the antecedent, rendering the rule useless.

Tags: