• Notes on Machine Learning 7: Bayesian Inference

    (ML 7.1) Bayesian inference - A simple example Thomas Bayes “Put distributions on everything, and then use rules of probability!” Bayes' rule Exampl. $D= (x_1, x_2, x_2) = (101, 100.5, 101.5)$ ($n=3$) $X \sim N(\theta, 1)$ iid given $\theta$ $\theta_{\rm MLE} = \overline{x} = \frac{1}{n}\sum_{i=1}^n = 101$ $ $ $\theta...


  • Notes on Probability Primer 3: Random variables

    (PP 3.1) Random Variables - Definition and CDF “Random variables are where the rubber meets the road in probability.” – Mathematical Monk “Random variable is not necessarily random and it’s not necesaarily a variable.” – David Duvenaud (TWiML&AI: Composing Graphical Models With Neural Networks) Definition. Given $(\Omega, \mathscr{A}, P)$, a...


  • Notes on Machine Learning 6: Maximum a posteriori (MAP) estimation

    (ML 6.1) Maximum a posteriori (MAP) estimation Setup. Given data $D = (x_1, \ldots, x_n)$, $x_i \in \mathbb{R}^d$. Assume a jont distribution $p(D, \theta) = p(D\vert \theta)p(\theta)$ where $\theta$ is a RV. Goal: choose a good value of $\theta$ for $D$. Choose $\theta_{\rm MAP} = \arg\max_\theta p(\theta\vert D)$. $\,$ cf....


  • Notes on Machine Learning 5: MLE for Exponential Families

    (ML 5.1) (ML 5.2) Exponential families Definition. Let $\Theta \subset \mathbb{R}^k$. An exponential family is a set $\{p_\theta : \theta \in \Theta\}$ of MPFs or MDFs on $\mathbb{R}$ s.t. $\theta \in \mathbb{R}^k$: parameter $x \in \mathbb{R}^d$: “distritution variable” $\eta_i : \Theta \rightarrow \mathbb{R}$ $s_i : \mathbb{R}^d \rightarrow \mathbb{R}$ (sufficient statistics)...


  • Notes on Information Thoery 1: Information theory and Coding

    (IC 1.1) Information theory and Coding - Outline of topics Information theory & coding Closely related fields: Cryptography & Cryptanalysis Algorithmic information theory (Kolmogorov complexity & minimum description length) Network information theory (less closely) Statistics & Machine learning (also) Portfolio theorry (also) Gambling (IC 1.2) Applications of Compression codes “Source...