Machine Learning Algorithms: Comparison and Best Practices

Classified in Mathematics

Written on in English with a size of 5.26 KB

Supervised Classification

Logistic Regression (LR)

  • Type: Classification (binary only)
  • Scaling: YES (StandardScaler)
  • Outliers: NOT robust
  • Categorical Variables: NO (encode first)
  • Idea: Sigmoid function → probability 0–1 → if ≥ 0.5 → class 1
  • Advantages: Fast, simple, interpretable, outputs probabilities
  • Disadvantages: Binary only, needs linear boundary, fails non-linear data
  • Metrics: Accuracy, Precision, Recall, F1, Confusion Matrix

Decision Trees (DT)

  • Type: Classification + Regression
  • Scaling: NO (never needs it)
  • Outliers: Robust
  • Categorical Variables: YES
  • Idea: IF-ELSE splits by feature → leaf = final prediction
  • Advantages: Interpretable, no scaling, handles any data type, fast
  • Disadvantages: Overfits easily, sensitive to small changes
  • Metrics: Gini, Accuracy, Confusion Matrix

Random Forest (RF)

  • Type: Classification + Regression
  • Scaling: NO
  • Outliers: Robust
  • Categorical Variables: YES
  • Idea: Many trees (BAGGING) → majority vote → final prediction
  • Advantages: Reduces overfit, stable, feature importance, handles missing data
  • Disadvantages: Slow, uninterpretable, many hyperparameters
  • Metrics: Accuracy, Feature Importance, OOB error

Supervised Regression

Linear Regression

  • Type: Regression (continuous Y only)
  • Scaling: YES (recommended)
  • Outliers: NOT robust (outliers distort line)
  • Categorical Variables: NO (encode first)
  • Idea: y = b0 + b1·x1 + b2·x2 + ... → predicts a continuous number
  • Advantages: Simple, fast, interpretable, coefficients show feature impact
  • Disadvantages: Assumes linearity, fails complex patterns, sensitive to outliers
  • Metrics: MAE, MSE, RMSE, R²

K-Nearest Neighbors (KNN)

  • Type: Classification + Regression
  • Scaling: YES (distance-based — MUST scale)
  • Outliers: NOT robust
  • Categorical Variables: NO
  • Idea: Classification: majority vote of K neighbors; Regression: average of K neighbors
  • Advantages: Simple, no training phase, works with small data
  • Disadvantages: Slow on large data, sensitive to outliers
  • Metrics: Accuracy, F1 (class); MAE, MSE, R² (regression)
  • Note: K too low = overfit/noisy; K too high = underfit. Best default K ≈ 5.

Unsupervised Learning

K-Means (KM)

  • Type: Clustering (NO target variable)
  • Scaling: YES (distance-based)
  • Outliers: NOT robust (shift centroids)
  • Categorical Variables: NO
  • Idea: K centroids → assign each point to nearest → update centroids → repeat until stable
  • Inertia: Sum squared distances to centroid (lower = more compact); used in Elbow method
  • Elbow Method: Plot inertia vs K → pick K where curve bends
  • Advantages: Fast, simple, scalable
  • Disadvantages: Must set K, assumes spherical clusters, random initialization leads to different results
  • Metrics: Inertia, Silhouette score

Hierarchical Clustering (HC)

  • Type: Clustering (NO target variable)
  • Scaling: YES (distance-based)
  • Outliers: Depends on linkage
  • Categorical Variables: NO
  • Idea: Agglomerative: start with N clusters → merge closest pairs → dendrogram shows history
  • Dendrogram: Tree diagram; cut horizontally → vertical lines crossed = number of clusters
  • Advantages: No K needed in advance, deterministic, shows full merge history
  • Linkage Types:
    • Single: Nearest point, good for outlier detection
    • Complete: Farthest point, avoids chains
    • Average: Centroid distance, robust to outliers
    • Ward: Minimizes within-cluster variance, best general choice
  • Disadvantages: Slow on large data, linkage choice matters significantly
  • Metrics: Dendrogram, Silhouette score, Elbow (distortion)

Association Rules (APr)

  • Note: High Confidence + Lift ≈ 1 means the consequent is equally common with or without the antecedent, rendering the rule useless.

Related entries: