Machine Learning Concepts: Regression, Trees, and Neural Networks

Posted by Anonymous and classified in Mathematics

Written on in English with a size of 10.96 KB

Role of Regression in Exploratory Data Analysis (EDA)

Regression analysis in EDA models the relationship between a dependent variable (Y) and one or more independent variables (X).

  • Relationship Visualization: It helps visualize how variables interact. Fitting a line (y= ax + b) through a scatter plot identifies if the relationship is linear or non-linear.
  • Correlation Identification: It identifies the nature of the association:
    • Positive Correlation: As X increases, Y increases.
    • Negative Correlation: As X increases, Y decreases.
    • No Correlation: Random distribution of points.
  • Prediction: It allows for the prediction of continuous values (e.g., house prices, temperature) based on the established trend line.
  • Outlier Detection: Plotting the regression line helps spot outliers—abnormal data points that deviate significantly from the trend.

Linear vs. Logistic Regression

Linear Regression

  • Used to predict continuous values (e.g., salary, marks, house price).
  • Output is a numerical value.
  • Uses a straight-line equation: y=a0+a1x.
  • Shows a linear relationship between variables.
  • Used for regression problems, not classification.

Logistic Regression

  • Used to predict categories (e.g., 0/1, Yes/No, Spam/Not spam).
  • Output is probability, which is converted into classes.
  • Uses an S-shaped (sigmoid) curve, not a straight line.
  • Models the probability that input belongs to a class.
  • Used for classification problems, especially binary classification.

Decision Tree Structure for Classification

A decision tree summarizes training data into a tree structure to make classification easy and interpretable.

  • Root Node: Represents the top-most attribute selected for splitting the data.
  • Internal Decision Nodes: Represent tests on input attributes. Each node checks a condition and branches out based on the result.
  • Branches: Represent the outcome of a test (e.g., Yes/No, True/False) leading to the next node.
  • Leaf Nodes: Represent the final classification or class labels. No further splitting happens here.
  • Rule Representation: Each path from the root to a leaf node represents a specific logical rule (IF-THEN rule) used to classify a new instance.

ID3 Versus C4.5 Decision Tree Construction

Attribute Selection Measure

  • ID3: Uses Information Gain (based on Entropy). It tends to bias towards attributes with many unique values.
  • C4.5: Uses Gain Ratio. It normalizes Information Gain by "Split Info" to handle the bias towards attributes with many values.

Data Types

  • ID3: Primarily handles discrete/categorical attributes.
  • C4.5: Can handle both continuous (by finding split thresholds) and discrete attributes.

Missing Values

  • ID3: Cannot handle missing data effectively.
  • C4.5: Supports missing values by ignoring them in calculations or distributing probabilities.

Pruning

  • C4.5: Includes post-pruning capabilities to optimize the tree and prevent overfitting, generating smaller trees compared to ID3.

CART Algorithm for Decision Tree Construction

CART (Classification and Regression Trees) constructs binary trees using the Gini Index as the impurity measure.

  1. Calculate the Gini Index for the entire training dataset (T) based on the target attribute: Gini(T) = 1 - Σ Pi2.
  2. For each attribute, calculate the weighted Gini Index for its possible binary splits: Gini split (T,A) = |s1|/|T| Gini(s1) + |s2|/|T| Gini(s2).
  3. Select the attribute and split condition that yields the minimum Gini Index.
  4. Create a decision node using the best split and divide the dataset into subsets (S1, S2).
  5. Recursively apply the process to the subsets until a stopping criterion is met.

Role of Activation Functions in Neural Networks

An activation function decides whether a neuron should "fire" (activate) based on the net sum of inputs and weights.

  • Non-Linearity: Its most critical role is introducing non-linearity. Without it, the network could only learn linear patterns.
  • Thresholding: It applies a rule to the net input (Net = Σwixi + b).
  • Normalization: It normalizes the output to a specific range, such as 0 to 1 (Sigmoid) or -1 to 1 (Tanh).

Prior, Posterior, and Likelihood Probability

1. Prior Probability — P(h)

It is the probability of a hypothesis before seeing any evidence or data. It represents initial belief or background knowledge.


2. Likelihood — P(E | h)

It is the probability of observing the evidence given that the hypothesis is true. It measures how well the hypothesis explains the observed data.


3. Posterior Probability — P(h | E)

It is the updated probability of the hypothesis after taking the evidence into account. It combines prior belief and likelihood using Bayes’ Theorem: P(h∣E)=P(E∣h)⋅P(h)/ p(E).

Perceptron Definition and Power

A perceptron is the simplest neural network consisting of a single neuron. It performs a weighted sum of inputs and applies a step activation for binary classification (output 0 or 1).

  • It can represent linearly separable functions like AND and OR.
  • It cannot represent non-linear functions like XOR.
  • Its decision boundary is always a straight line/hyperplane, limiting its representational power.

Artificial Neural Network vs. Biological Learning

Artificial Neural Network (ANN)

An ANN is a computational model inspired by the human brain. It consists of interconnected nodes (neurons) arranged in layers. ANNs learn patterns from data by adjusting weights based on errors (e.g., using backpropagation).

Biological Learning Models

These models are inspired by the human brain and nervous system. Learning occurs by strengthening or weakening synapses based on experience. Hebbian Learning (“neurons that fire together, wire together”) is a key principle.

CLIQUE Algorithm for High-Dimensional Clustering

CLIQUE is a grid-based and density-based clustering algorithm for high-dimensional data.

  1. Grid Partitioning: Divides the data space into a grid of non-overlapping rectangular units.
  2. Identification: Identifies "dense units"—grid cells exceeding a threshold (T).
  3. Apriori Property: Uses a bottom-up approach. If a k-dimensional unit is dense, all its (k-1) dimensional projections must also be dense.
  4. Cluster Formation: Adjacent dense units are connected to form the final clusters.

Advantages

  • Automatically finds subspaces of the highest dimensionality.
  • Insensitive to the order of input records.
  • Scalable to large datasets.

Reinforcement Learning vs. Supervised Learning

Reinforcement Learning (RL)

  • Learns from interaction with the environment (trial and error).
  • No supervisor; feedback is delayed (rewards).
  • Decisions are sequential; goal is to maximize cumulative reward.

Supervised Learning

  • Learns from labeled data (input-output pairs).
  • Has a supervisor or teacher providing correct answers.
  • Feedback is immediate; goal is to minimize prediction error.

Types and Evaluation of Rewards in RL

The reward is the feedback from the environment based on the agent's action.

  • Immediate Reward (rt+1): The reward received right after taking an action, reflecting short-term benefit.
  • Long-Term (Delayed) Reward (Gt): The accumulated reward over time (e.g., winning a game).

Evaluation (Cumulative Reward)

The total reward (Gt) is calculated as the sum of all future rewards, often incorporating a discount factor (γ) to prioritize immediate rewards.

Clustering Versus Classification

Clustering (Unsupervised Learning)

  • Data: Uses unlabeled data; finds structure independently.
  • Process: Partitions data into meaningful groups through exploration.
  • Output: Forms clusters based on similarity.

Classification (Supervised Learning)

  • Data: Uses labeled data with predefined class targets.
  • Process: Maps input features to specific output labels using prior training.
  • Output: Assigns each input to a class/category.

Components of Reinforcement Learning (RL)

RL involves an agent learning decisions by interacting with an environment.

  • Agent: The learner or decision-maker.
  • Environment: The world where the agent operates.
  • State (S): The current situation or configuration of the environment.
  • Action (A): What the agent does, causing transitions between states.
  • Reward (R): The feedback signal (positive or negative) received after an action.

Policy in Reinforcement Learning

A policy is a rule or strategy that tells an RL agent which action to take in each state.

Types of Policies

  • Deterministic Policy: Always selects one fixed action for a state.
  • Stochastic Policy: Chooses actions based on probabilities.
  • Optimal Policy: Gives the highest total (cumulative) reward.
  • Stationary Policy: Does not change over time.

Related entries: