Advanced Inference Techniques in AI and Machine Learning
Classified in Mathematics
Written on in English with a size of 6.97 KB
Bayesian Probability and Conditional Independence
The foundation of conditional probability is defined by Bayes' Rule:
$$P(A|B) = \frac{P(A, B)}{P(B)}$$
The notation $X \perp Y$ signifies that events $X$ and $Y$ are independent.
Applying Conditional Independence
When calculating the probability of a cause ($C$) given multiple effects ($M_1, M_2$), assuming $M_1$ and $M_2$ are conditionally independent given $C$:
$$P(C|M_1, M_2) = \frac{P(M_1, M_2|C)P(C)}{P(M_1, M_2)} = \frac{P(M_1|C)P(M_2|C)P(C)}{P(M_1, M_2)}$$
Example Calculation:
- Numerator: $P(M_1|C)P(M_2|C)P(C) = (0.8)(0.8)(0.01) = 0.0064$
- Denominator (Law of Total Probability): $P(M_1, M_2) = P(M_1, M_2|C)P(C) + P(M_1, M_2|\neg C)P(\neg C)$
- Assuming conditional independence for $\neg C$: $P(M_1, M_2) = P(M_1|C)P(M_2|C)P(C) + P(M_1|\neg C)P(M_2|\neg C)P(\neg C)$
- $P(M_1, M_2) = (0.8)(0.8)(0.01) + (0.096)(0.096)(0.99) \approx 0.0064 + 0.0091 \approx 0.0155$
Thus, the posterior probability is: $P(C|M_1, M_2) = \frac{0.0064}{0.0155} \approx 0.4129$ (i.e., 41.3%).
Law of Total Probability
The probability of an event (+) based on a condition ($C$) and its negation ($\neg C$):
$$P(+) = P(+|C)P(C) + P(+|\neg C)P(\neg C)$$
Statistical Estimation and Optimization
Maximum Likelihood Estimation (MLE)
The Likelihood function $L(K)$ for independent observations $X_1, X_2, \dots, X_n$ given parameters $K$ is the product of individual probabilities:
$$L(K) = P(X_1, X_2, \dots, X_n|K) = \prod_{i=1}^{n} P(X_i|K)$$
To simplify calculation, we often use the Log-Likelihood, which converts the product into a sum:
$$\log L(K) = \sum_{i=1}^{n} \log P(X_i|K)$$
Example Log-Likelihood Function
To find the parameter $\mu$ that maximizes the probability of observed counts $(a, b, c, d)$, we take the log of the probability and differentiate:
$$\log P(a, b, c, d | \mu) = \log K + a \log \frac{1}{2} + b \log \mu + c \log 2\mu + d \log (\frac{1}{2} - 3\mu)$$
Complex Likelihood Function Example
A complex likelihood function $L(\theta, p, q)$ based on counts $m_i, f_i, m_f, f_f, b$:
$$L(\theta, p, q) = (\theta p)^{m_i} \cdot (\theta (1 - p))^{f_i} \cdot ((1 - \theta)q^2)^{m_f} \cdot ((1 - \theta)(1 - q)^2)^{f_f} \cdot ((1 - \theta) \cdot 2q(1 - q))^b$$
This simplifies to:
$$L(\theta, p, q) = \theta^{(m_i+f_i)} (1 - \theta)^{(m_f + f_f + b)} \cdot p^{m_i} (1 - p)^{f_i} \cdot q^{2m_f} (1 - q)^{2f_f} (2q(1 - q))^b$$
Maximum A Posteriori (MAP) Estimation
MAP estimation seeks the hypothesis ($h$) that maximizes the posterior probability $P(h|D)$. This approach minimizes the expected loss.
$$P(h|D) = \frac{P(D|h)P(h)}{P(D)}$$
The MAP hypothesis is found by returning $\arg\max_h P(D|h)P(h)$.
Expectation-Maximization (EM) Algorithm
Handling Missing Data and Joint Distributions
The EM algorithm is used when data is incomplete or contains hidden variables. Similar to the coin example, the process involves iteratively assigning probabilities and updating parameters until the likelihood converges.
In cases involving hidden values (e.g., $A$'s and $B$'s represented by $h$), one must guess the value of the hidden variable (e.g., $B$) and iterate between the Expectation (E) step and the Maximization (M) step.
Inference and Learning Algorithms
Brute Force Probability Calculation
To find the maximum probability hypothesis, one can calculate the probability for all hypotheses ($h$):
- Sum the probabilities of all hypotheses consistent with the data.
- Return the hypothesis with the maximum probability.
Naive Bayes Classifier
In the context of the "Play Tennis" example, the Naive Bayes classifier calculates the probability of a class ($Y$) given features (Weather $W$, Temperature $Temp$) by assuming conditional independence of features:
$$P(Y|W, Temp) \propto P(Y) \cdot P(W|Y) \cdot P(Temp|Y)$$
This calculation is performed for both "Yes" and "No" classes, and the class with the maximum resulting probability is returned.
Decision Tree Pruning Techniques
Two common methods for simplifying decision trees and preventing overfitting are:
- Reduced Error Pruning: Pruning nodes based on performance on a separate validation set.
- Rule Post-Pruning: Converting the tree into a set of rules, pruning each rule separately, and then sorting the final rules into the desired sequence for classification.
Probability Computation via Weighted Sampling
Calculating Marginal Probability
To compute the marginal probability $P(\text{Burglary}=\text{true})$ using weighted samples, we sum the weights for every sample where $\text{Burglary}=\text{true}$ and divide by the sum of all sample weights.
Calculating Conditional Probability
To compute the conditional probability of an event $X=\text{true}$ dependent on $Y=\text{true}$:
We sum the weights of all samples where both $X=\text{true}$ and $Y=\text{true}$ occur, and divide this sum by the total sum of weights of all samples where $Y=\text{true}$.
Knowledge Representation and Logic
First-Order Logic (FOL) Statements
A set of FOL statements describing relationships between dogs, owners, animal lovers, and killing:
- $\forall x: \text{Dog}(x) \land \text{Owns}(\text{Jack}, x)$
- $\forall x: (\exists y: \text{Dog}(y) \land \text{Owns}(x, y)) \implies \text{AnimalLover}(x)$
- $\forall x: \text{AnimalLover}(x) \implies (\forall y: \text{Animal}(y) \implies \neg \text{Kills}(x, y))$
- $\text{Kills}(\text{Jack}, \text{Tuna}) \lor \text{Kills}(\text{Curiosity}, \text{Tuna})$
- $\text{Cat}(\text{Tuna})$
- $\forall x: \text{Cat}(x) \implies \text{Animal}(x)$
Conjunctive Normal Form (CNF) Clauses
The corresponding CNF clauses for resolution inference (using constants $D$ and $Tuna$):
A1. $(\text{Dog}(D))$
A2. $(\text{Owns}(\text{Jack}, D))$
B. $(\neg \text{Dog}(y) \lor \neg \text{Owns}(x, y) \lor \text{AnimalLover}(x))$
C. $(\neg \text{AnimalLover}(a) \lor \neg \text{Animal}(b) \lor \text{Kills}(a, b))$
D. $(\text{Kills}(\text{Jack}, \text{Tuna}) \lor \text{Kills}(\text{Curiosity}, \text{Tuna}))$
E. $(\text{Cat}(\text{Tuna}))$
F. $(\neg \text{Cat}(z) \lor \text{Animal}(z))$
Negated Goal (for proof by contradiction): $\neg G: (\neg \text{Kills}(\text{Curiosity}, \text{Tuna}))$
Prolog Syntax
Prolog uses facts and rules for logical deduction:
- Fact:
man(socrates).
- Rule:
mortal(X) :- man(X).
(Meaning: X is mortal if X is a man.) - Query:
mortal(socrates).
Note that Prolog uses commas for conjunction (AND) and semicolons for disjunction (OR).