Notes on Machine Learning 13: Graphical Models
by 장승환
(ML 13.1) (ML 13.2) Directed graphical models - introductory examples
(Directed) Graphica “Models” aka “Bayesian” networks
Better name would be “conditional independence diagrams” of probability distributions
Key notions:
- factorization of probability distributions
- notational device
- useful for visualization of
(a) conditional independence properties
(b) inference algorithms (DP, MCMC)
Why conditional independence important? Tractable inference
Thnking “Graphically”
Let $A, B, C$ be random variables.
$p(a, b,c) = p(c\vert a, b)p(a, b) = p(c\vert a, b)p(b\vert a)p(a)$
If $B$ and $C$ are independent given $A$, then $p(c\vert a, b) = p(c\vert a)$.
(ML 13.3) (ML 13.4) Directed graphical models - formalism
Notation.
- DAG (Directed Acyclic Graph) : no directed cycles
- $pa(i) =$ parents of vertex
Definition. Given $(X_1, \ldots, X_n) \sim p$ (pmf or odf) and an ordered DAG $G$ on $n$ vertices, we say $X$ respects $G$ (or $p$ respects $G$) if
Remark. This does not imply that any particular random variables are conditionally dependent.
Example. $X = (X_1, X_2, X_3)$ mutually independent
Then $p(x_1, x_2, x_3) = p(x_1)p(x_2)p(x_3)$
Terminology. A Complete directed graph.
Example. Any distribution respects any complete DAG.
Remark.
- Can combine random variables into vectors
- If the factors are normalized conditional distributions, then the product is normalized also. (Exercise)
(ML 13.5) Generative process specification (“a handy convention”)
$X_1, X_2 \sim \text{Bernoulli}(1/2)$, independent
$X_3 \sim N(X_1 + X_2, \sigma^2)$
$X_4 \sim N(aX_2 + b, 1)$
$X = (X_1, \ldots, X_5)$ respects the following graph
and wee have the following factorization $p(x_1, \ldots, x_5) = p(x_1)p(x_2)p(x_3\vert x_1, x_2)p(x_4\vert x_2)p(x_5\vert x_4)$.
Nice sampleing and visualization.
Examples of (Directed) Graphical Models
(ML 13.6) Graphical model for Bayesian linear regression
$D = ((x_1, y_1), \ldots, (x_n, y_n)), x_i \in \mathbb{R}^d, y_i \in \mathbb{R}$
$f(x) = w^T\varphi(x)$
$w \sim N(0, \sigma_0^2I)$
$Y_i \sim N(w^T\varphi(x_i), \sigma^2)$, independent given $w$
$p(w, y_1, \ldots, y_n) = p(w)\prod_{i=1}^n p(y_i\vert w)$
Plate notation.
(ML 13.7) Graphical model for Bayesian Naive Bayes
$D = ((x^{(1)}, y_1), \ldots, (x^{(n)}, y_n)), x^{(i)} \in \mathbb{R}^d, y_i \in \{1, \ldots, m\}$
$\pi \sim \text{Dir}(\alpha)$
$r_{jy} \sim \text{Dir}(\beta)$ for $j = \{1, \ldots, d\}, y \in \{1, \ldots, m\}$
$Y \sim \pi, Y_i \sim \pi$
$X_j \sim r_{jY}, X_j^{(i)} \sim r_{jY_i}$
(ML 13.8) (ML 13.9) Conditional independence in (directed) graphical models - basic examples
1 “Tail-tail”
2 “Head-tail” (or “Tail-head”)
3 “Head-head”
(ML 13.10) (ML 13.11) D-separation
“The whole point of graphical models is to express the conditional independence properties of a probability distribution. And the D-separation criterion gives you a way to read off those conditional independence properties from the graphocal model for a probability distribution.”
D-separation is a vast generalization of these:
Terminology. Descendents
Let $G$ be a DAG. Let $A, B, C$ be disjoint subsets of vertices.
($A \cup B \cup C$ is not necessarily all of the vertices.)
Definition. A path (not necessarily directed) between two vertices is blocked (w.r.t. $C$) if it passes through a vertex $v$ s.t. either (a) the arrows are head-tail or tail-tail, and $v \in C$ or (b) the arrows are head-head and $v \not\in C$ and none of the descendents of $v$ are in $C$.
Definition. $A$ and $B$ are d-separated by $C$ if all paths from a vertex of $A$ to a vertex of $B$ are blocked (w.r.t. $C$).
Theorem (d-separation). If $A$ and $B$ are d-separated by $C$ then $A$ and $B$ are conditionally independent given $C$.
(ML 13.12) (ML 13.13) How to use D-separation - illustrative examples
Example 1. $C = \{3\}$ Are $X_i$ and $X_j$ independent given $X_3$?
Example 2.
Example 3.
Subscribe via RSS