Advanced Inference Techniques in AI and Machine Learning

Written on October 1, 2025 in English with a size of 6.97 KB

Bayesian Probability and Conditional Independence

The foundation of conditional probability is defined by Bayes' Rule:

$$P(A|B) = \frac{P(A, B)}{P(B)}$$

The notation $X \perp Y$ signifies that events $X$ and $Y$ are independent.

Applying Conditional Independence

When calculating the probability of a cause ($C$) given multiple effects ($M_1, M_2$), assuming $M_1$ and $M_2$ are conditionally independent given $C$:

$$P(C|M_1, M_2) = \frac{P(M_1, M_2|C)P(C)}{P(M_1, M_2)} = \frac{P(M_1|C)P(M_2|C)P(C)}{P(M_1, M_2)}$$

Example Calculation:

Numerator: $P(M_1|C)P(M_2|C)P(C) = (0.8)(0.8)(0.01) = 0.0064$
Denominator (Law of Total Probability): $P(M_1, M_2) = P(M_1, M_2|C)P(C) + P(M_1, M_2|\neg C)P(\neg C)$
Assuming conditional independence for $\neg C$: $P(M_1, M_2) = P(M_1|C)P(M_2|C)P(C) + P(M_1|\neg C)P(M_2|\neg C)P(\neg C)$
$P(M_1, M_2) = (0.8)(0.8)(0.01) + (0.096)(0.096)(0.99) \approx 0.0064 + 0.0091 \approx 0.0155$

Thus, the posterior probability is: $P(C|M_1, M_2) = \frac{0.0064}{0.0155} \approx 0.4129$ (i.e., 41.3%).

Law of Total Probability

The probability of an event (+) based on a condition ($C$) and its negation ($\neg C$):

$$P(+) = P(+|C)P(C) + P(+|\neg C)P(\neg C)$$

Statistical Estimation and Optimization

Maximum Likelihood Estimation (MLE)

The Likelihood function $L(K)$ for independent observations $X_1, X_2, \dots, X_n$ given parameters $K$ is the product of individual probabilities:

$$L(K) = P(X_1, X_2, \dots, X_n|K) = \prod_{i=1}^{n} P(X_i|K)$$

To simplify calculation, we often use the Log-Likelihood, which converts the product into a sum:

$$\log L(K) = \sum_{i=1}^{n} \log P(X_i|K)$$

Example Log-Likelihood Function

To find the parameter $\mu$ that maximizes the probability of observed counts $(a, b, c, d)$, we take the log of the probability and differentiate:

$$\log P(a, b, c, d | \mu) = \log K + a \log \frac{1}{2} + b \log \mu + c \log 2\mu + d \log (\frac{1}{2} - 3\mu)$$

Complex Likelihood Function Example

A complex likelihood function $L(\theta, p, q)$ based on counts $m_i, f_i, m_f, f_f, b$:

$$L(\theta, p, q) = (\theta p)^{m_i} \cdot (\theta (1 - p))^{f_i} \cdot ((1 - \theta)q^2)^{m_f} \cdot ((1 - \theta)(1 - q)^2)^{f_f} \cdot ((1 - \theta) \cdot 2q(1 - q))^b$$

This simplifies to:

$$L(\theta, p, q) = \theta^{(m_i+f_i)} (1 - \theta)^{(m_f + f_f + b)} \cdot p^{m_i} (1 - p)^{f_i} \cdot q^{2m_f} (1 - q)^{2f_f} (2q(1 - q))^b$$

Maximum A Posteriori (MAP) Estimation

MAP estimation seeks the hypothesis ($h$) that maximizes the posterior probability $P(h|D)$. This approach minimizes the expected loss.

$$P(h|D) = \frac{P(D|h)P(h)}{P(D)}$$

The MAP hypothesis is found by returning $\arg\max_h P(D|h)P(h)$.

Expectation-Maximization (EM) Algorithm

Handling Missing Data and Joint Distributions

The EM algorithm is used when data is incomplete or contains hidden variables. Similar to the coin example, the process involves iteratively assigning probabilities and updating parameters until the likelihood converges.

In cases involving hidden values (e.g., $A$'s and $B$'s represented by $h$), one must guess the value of the hidden variable (e.g., $B$) and iterate between the Expectation (E) step and the Maximization (M) step.

Inference and Learning Algorithms

Brute Force Probability Calculation

To find the maximum probability hypothesis, one can calculate the probability for all hypotheses ($h$):

Sum the probabilities of all hypotheses consistent with the data.
Return the hypothesis with the maximum probability.

Naive Bayes Classifier

In the context of the "Play Tennis" example, the Naive Bayes classifier calculates the probability of a class ($Y$) given features (Weather $W$, Temperature $Temp$) by assuming conditional independence of features:

$$P(Y|W, Temp) \propto P(Y) \cdot P(W|Y) \cdot P(Temp|Y)$$

This calculation is performed for both "Yes" and "No" classes, and the class with the maximum resulting probability is returned.

Decision Tree Pruning Techniques

Two common methods for simplifying decision trees and preventing overfitting are:

Reduced Error Pruning: Pruning nodes based on performance on a separate validation set.
Rule Post-Pruning: Converting the tree into a set of rules, pruning each rule separately, and then sorting the final rules into the desired sequence for classification.

Probability Computation via Weighted Sampling

Calculating Marginal Probability

To compute the marginal probability $P(\text{Burglary}=\text{true})$ using weighted samples, we sum the weights for every sample where $\text{Burglary}=\text{true}$ and divide by the sum of all sample weights.

Calculating Conditional Probability

To compute the conditional probability of an event $X=\text{true}$ dependent on $Y=\text{true}$:

We sum the weights of all samples where both $X=\text{true}$ and $Y=\text{true}$ occur, and divide this sum by the total sum of weights of all samples where $Y=\text{true}$.

Knowledge Representation and Logic

First-Order Logic (FOL) Statements

A set of FOL statements describing relationships between dogs, owners, animal lovers, and killing:

$\forall x: \text{Dog}(x) \land \text{Owns}(\text{Jack}, x)$
$\forall x: (\exists y: \text{Dog}(y) \land \text{Owns}(x, y)) \implies \text{AnimalLover}(x)$
$\forall x: \text{AnimalLover}(x) \implies (\forall y: \text{Animal}(y) \implies \neg \text{Kills}(x, y))$
$\text{Kills}(\text{Jack}, \text{Tuna}) \lor \text{Kills}(\text{Curiosity}, \text{Tuna})$
$\text{Cat}(\text{Tuna})$
$\forall x: \text{Cat}(x) \implies \text{Animal}(x)$

Conjunctive Normal Form (CNF) Clauses

The corresponding CNF clauses for resolution inference (using constants $D$ and $Tuna$):

A1. $(\text{Dog}(D))$

A2. $(\text{Owns}(\text{Jack}, D))$

B. $(\neg \text{Dog}(y) \lor \neg \text{Owns}(x, y) \lor \text{AnimalLover}(x))$

C. $(\neg \text{AnimalLover}(a) \lor \neg \text{Animal}(b) \lor \text{Kills}(a, b))$

D. $(\text{Kills}(\text{Jack}, \text{Tuna}) \lor \text{Kills}(\text{Curiosity}, \text{Tuna}))$

E. $(\text{Cat}(\text{Tuna}))$

F. $(\neg \text{Cat}(z) \lor \text{Animal}(z))$

Negated Goal (for proof by contradiction): $\neg G: (\neg \text{Kills}(\text{Curiosity}, \text{Tuna}))$

Prolog Syntax

Prolog uses facts and rules for logical deduction:

Fact: man(socrates).
Rule: mortal(X) :- man(X). (Meaning: X is mortal if X is a man.)
Query: mortal(socrates).

Note that Prolog uses commas for conjunction (AND) and semicolons for disjunction (OR).

Related entries:

Tags: