Notes on Probability Primer 5: Multiple random variables

(PP 5.1) Multiple discrete random variables

Definition. Given $(\Omega, \mathscr{A}, P)$, a random vector is a measurable function

$X : \Omega \rightarrow \mathbb{R}^d$

where $d \in \mathbb{N}$.

Definition. A discrete random vector $X \in \mathbb{R}^d$ is s.t. $X(\Omega)$ is countable.

Definition. The (joint) PMF (or joint distribution) of a discrete random vector $X \in \mathbb{R}^d$ is the function

$p:\mathbb{R}^d \rightarrow [0,1]$

such that $p(x)= P[X=x]$ for all $x \in \mathbb{R}^d$.

Notation. $X = (X_1, \ldots, X_d)$, $x = (x_1, \ldots, x_d)$
($X = x$ means $X_i = x_i$ for all $i$)
$p(x) = p(x_1, \ldots, x_d)$
$p_X(x) = p(x)$

Remark.

$P[X \in A] = \sum_{x\in \mathscr{X}, x \in A} p(x)$

where $\mathscr{X} = X(\Omega)$.

Reamrk. $g(X)$ is a random vector if $X \in \mathbb{R}^d$ is a random variable and $g: \mathbb{R}^d \rightarrow \mathbb{R}^k$ is measurable.

Proposition.

$\mathbb{E}(g(X)) = \sum_{x\in \mathscr{X}} g(x)p(x)$

for any measurable $g: \mathbb{R}^d \rightarrow \mathbb{R}$ such that this sum is well-defined.

(PP 5.2) Marginals and conditionals

Deal with 2-dimensional case only.

Fix $(X, Y) \in \mathbb{R}^2$ with $p(x, y) = P[X=x, Y=y]$.

Definition. The marginal PMF of $X$ is $p_X(x) = P[X = x]$.

Proposition.

$p_X(x) = \sum_{y \in Y} p(x, y)$

Proof. $p_X(x) = P[X=x] = \sum_{y \in \mathscr{Y}} P[X=x, Y=y] = \sum_{y \in \mathscr{Y}}p(x,y)$
noting $\{\omega \in \Omega : X(\omega) = x\} = \cup_{y \in \mathscr{Y}}\{\omega \in \Omega : X(\omega) = x, Y(\omega) = y\}$

Notation. $p(x) = p_X(x)$, $p(y) = p_Y(y)$

Definition. The conditional PMF of $X$ given $Y=y$ is

$p(x\vert y) = P[X=x \vert Y = y]$

(when $P[Y=y] > 0$.)

Remark.

$p(x\vert y) = \frac{P[X=x, Y=y]}{P[Y=y]} = frac{p(x, y)}{p(y)}$

Definition. The conditional expectation of $X$ given $Y=y$ (when $p(y) > 0$) is

$\mathbb{E}(X\vert Y=y) = \sum_{x \in \mathscr{X}} xp(x \vert y)$

when this sum is well-defined.

Remark. $\mathbb{E}(X\vert Y)$ is a random variable that depends on the random variable $Y$.

(PP 5.3) Multiple random variables with densities

(PP 5.4) Independence, Covariance, and Correlation

Consider random variables $X, Y, Z$ etc on $(\Omega, \mathcal{A}, P)$.

$\leadsto$ arise joint distributions (for random vectors)

Definition. $X, Y$ independent if
(discrete) $\,\,\,$ $p(x, y) = p(x) p(y)$ for all $x \in \mathscr{X}, y \in \mathscr{Y}$.
(density) $\,\,\,$ $p(x, y) = f(x) f(y)$ for all $x \in \mathbb{R}, y \in \mathbb{R}$.

Definition. $X_1, \ldots, X_d$ independent if
(discrete) $\,\,\,$ $p(x_1, \ldots, x_d) = \prod_{i=1}^d p(x_i)$ for all $x_i \in \mathscr{X}i$.
(density) $\,\,\,$ $f(x_1, \ldots, x_d) = \prod{i=1}^d f(x_i)$ for all $x_i \in \mathscr{X}_i$.

Theorem. TAFE.

$X, Y$ independent;
$P(X \in A, Y \in B) = P(X \in A)P(Y \in B)$ for all $A, B \in \mathbb{B}(\mathbb{R})$;
$g(X), h(Y)$ independent for all measurable $g, h: \mathbb{R} \rightarrow \mathbb{R}$;
$\mathbb{E}(g(X)h(Y)) = \mathbb{E}(g(X))\mathbb{E}(h(Y))$ for all measurable $g, h: \mathbb{R} \rightarrow \mathbb{R}_{\ge 0}$;
$\mathbb{E}(g(X)h(Y)) = \mathbb{E}(g(X))\mathbb{E}(h(Y))$ for all measurable $g, h: \mathbb{R} \rightarrow \mathbb{R}_{\ge 0}$ s.t. $\mathbb{E}(\vert g(X)\vert), \mathbb{E}(\vert h(Y)\vert) < \infty$.

Remark. $X, Y$ independent & $\mathbb{E}(\vert X\vert), \mathbb{E}(\vert Y\vert) < \infty$ implies $\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y)$.
But not conversely.

Definition. The covariance of $X, Y$ is

${\rm Cov}(X, Y) = \mathbb{E}((X - \mathbb{E}(X))(Y - \mathbb{E}(Y)))$

Remark.

${\rm Cov}(X, X) = \sigma^2(X)$
${\rm Cov}(X, Y) = \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)$
$X, Y$ independnet $\Rightarrow$ ${\rm Cov}(X, Y) = 0$

Definition. The (Pearson) correlation coefficient of $X, Y$ is

$\rho(X, Y) = \frac{ {\rm Cov} (X, Y)}{\sigma(X)\sigma(Y)}$

(PP 5.5) Law of large numbers and Central limit theorem

Definition. $X_1, \ldots, X_n$ iid (independent and identically distributed) if they are independent and $p_{X_1} = \cdots = p_{X_n}$ ($f_{X_1} = \cdots = f_{X_n}$)

Let $X_1, \ldots, X_n$ be iid with mean $\mu$ and variance $0< \sigma^2 < \infty$.

Theorem. (LLN) “very intuitive result”

$\frac{1}{n}\sum_{i=1}^n X_i \rightarrow \mu$

w.p. $1$ (almost surely).

Example. $X \sim$ Bernoulli$(\mu)$, $X_i \in {0, 1}$.

Theorem. (CLT) “astonishing” (very surprising)

$\sqrt{\frac{n}{\sigma^2}}(\frac{1}{n}\sum_{i=1}^n X_i - \mu) \rightarrow^D N(0, 1)$

in the sense that the cdf of the former converges to tht latter.