Notes on Probability Primer 1: Measure theory

(PP 1.1) Measure theory: Why measure theory - The Banach-Tarski Paradox

A bit more detailed explanations on the Banach-Tarski paradox here: The Banach–Tarski Paradox.

(PP 1.2) Measure theory: $\sigma$-algebras

Definition. Given a set $\Omega$, a $\sigma$-algebra on $\Omega$ is a nonempty collection $\mathcal{A} \subset 2^{\Omega}$ s.t.

closed under complements ($E \in \mathcal{A} \Rightarrow E^c \in \mathcal{A}$)
closed under countable unions ($E_1, E_2, \ldots \in \mathcal{A} \Rightarrow \cup_{i = 1}^\infty E_i \in \mathcal{A}$)

Remark

$\Omega \in \mathcal{A}$ (some $E \in \mathcal{A}$. $E^c \in \mathcal{A}$. $\Omega = E \cup E^c \in \mathcal{A}$.)
$\emptyset \in \mathcal{A}$ ($\emptyset = \Omega^c \in \mathcal{A}$)
$\mathcal{A}$ is closed under countable intersections:
If $E_1, E_2, \ldots \in \mathcal{A}$, then (by De Morgan’s laws)

$\cap_{i=1}^\infty = \cap (E_i^c)^c = (\cup_{i=1}^\infty)^c \in \mathcal{A}$

(PP 1.3) Measure theory: Measures

Definition. Given $\mathscr{C}$ $\subset$ $2^\Omega$, the $\sigma$-algebra generated by $\mathscr{C}$, witten $\sigma(\mathscr{C})$ is the “smallest” $\sigma$-algebra containing $\mathscr{C}$. (That is, $\sigma(\mathscr{C}) = \cap_{\mathcal{A} \supset \mathscr{C}} \mathcal{A}$)

Remark. $\sigma(\mathscr{C})$ always exists because:

$2^\Omega$ is a $\sigma$-algebra
any intersection of $\sigma$-algebras is a $\sigma$-algebra

Examples.

$\mathcal{A} = \{\emptyset, \Omega\}$
$\mathcal{A} = \{\emptyset, E, E^c, \Omega\}$
If $\Omega = \mathbb{R}$ then the Borel $\sigma$-algebra is $\mathcal{B} = \sigma(\mathscr{T})$ where $\mathscr{T} =$ $\{$open sets of $\mathbb{R} \}$

Definition. A measure $\mu$ on $\Omega$ with $\sigma$-algebra $\mathcal{A}$ is a function $\mu : \mathcal{A} \rightarrow [0, \infty]$ s.t.

$\mu(\emptyset) = 0$
$\mu(\cup_{i=1}^\infty E_i) = \sum_{i=1}^\infty \mu(E_i)$ for any sequence $E_1, E_2, \ldots \in \mathcal{A}$ of pairwise disjoint sets (countable additivity)

Definition. A probability measure is a measure $P$ s.t. $P(\Omega)= 1$

(PP 1.4) Measure theory: Examples of Measures

The defining conditions of a probability measure is called the Kolmogorov’s axioms. (father of modern probability theory)

Examples.

(Finite set) $\Omega = \{1, \ldots, n\}, \mathcal{A} = 2^\Omega$, $P({k}) = P(k) = \frac{1}{n}$ (gives uniform distribution)
$P(\{1, 2, 4\}) = P(\{1\} \cup \{2\} \cup \{4\})) = P(1) + P(2) + P(4)$
(Countably infinite) $\Omega = \{1, 2, 3, \ldots \}, \mathcal{A} = 2^\Omega$
$P(k) =$ probability it takes $k$ coinflips to get heads $=\alpha(1-\alpha)^{k-1} = \frac{1}{2}(1-\frac{1}{2})^{k-1}$
(gives gemetric distribution)
(Uncountable) $\Omega = [0, \infty)$, $\mathcal{A} = \mathcal{B}([0, \infty )])$
$P([0, x)) = 1-e^{-x}$ $\forall x>0$ (gives exponential distribution)
Note: $P(\{x\})$ $\forall x>0$ (a symptom of continuous distributions)

(PP 1.5) Measure theory: Basic Properties of Measures

(4) Lebesgue measure (on $\mathbb{R}$). $\Omega, \mathcal{A} = \mathcal{B}(\mathbb{R})$

$\mu((a, b)) = b-1$

for any $a, b \in \mathbb{R}, a< b$.

Theorem (Basic properties of measures). Let $(\Omega, \mathcal{A}, \mu)$ be a measure space.

Monoronicity: If $E, F \in \mathcal{A}$ and $E \subset F$, then $\mu(E) \le \mu(F)$.
Subadditivity (handy!): If $E_1, E_2, \cdots \in \mathcal{A}$, then $\mu(\cup_{i=1}^\infty E_i) \le \sum_{i=1}^\infty E_i$

(PP 1.6) Measure theory: Basic Properties of Measures (continued)

Continuity from below: If $E_1, E_2, \cdots \in \mathcal{A}$ and $E_1 \subset E_2 \subset \cdots$, then $\mu(\cup_{i=1}^\infty E_i) = \lim_{i \rightarrow \infty} \mu(E_i)$
Continuity from above: If $E_1, E_2, \cdots \in \mathcal{A}$ and $E_1 \supset E_2 \supset \cdots$ and $\mu(E_1) < \infty$, then $\mu(\cap_{i=1}^\infty E_i) = \lim_{i \rightarrow \infty} \mu(E_i)$

Continuity from below

Continuity from above

Properties 3 and 4 are very innocent looking but are essential in proving pretty nontrivial theorems!

(PP 1.7) Measure theory: More Properties of Probability Measures

Facts. Let $(\Omega, \mathcal{A}, P)$ be a probabilistic measure space with $E, F, E_i \in \mathcal{A}$.

$P(E \cup F) = P(E) + P(F)$ if $E \cap F = \emptyset$
$P(E \cup F) = P(E) + P(F) - P(E \cap F)$
$P(E) = 1 - P(E)$
$P(E \cap F^c) = P(E) - P(E \cap F)$
(Inclusion-Exclusion formula) $P(\cup_{i=1}^n E_i) = \sum_{i} P(E_i) - \sum_{i<j}P(E_i \cap E_j) + \sum_{i<j<k}P(E_i \cap E_j \cap E_k) - \cdots +(-1)P(E_1 \cap E_2 \cap \cdots E_n)$
$P(\cup_{i=1}^n E_i) \le \sum_{i=1}^n P(E_i)$ and $P(\cup_{i=1}^\infty E_i) \le \sum_{i=1}^\infty P(E_i)$

(PP 1.8) Measure theory: CDFs and Borel Probability Measures

$\mathbb{R}, \mathcal{B}(\mathbb{R})$, $(a, b)$

Definition. A Borel measure on $\mathbb{R}$ is a measure on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$

Definition. A CDF (cumulative distribution function) is a function $F : \mathbb{R} \rightarrow \mathbb{R}$ s.t.

$F$ is nondecreasing ($x \le y \Rightarrow F(x) \le F(y)$)
$F$ is right continuous ($\lim_{x \rightarrow a^+} = F(a)$)
$\lim_{x \rightarrow \infty} F(x) = 1$
$\lim_{x \rightarrow -\infty} F(x) = 0$

Theorem.

If $F$ is a CDF, then there is a unique Borel probability measure on $\mathbb{R}$ s.t. $P((-\infty, x]) = F(x)$ $\forall x \in \mathbb{R}$
If $P$ is a Borel probability measure on $\mathbb{R}$, then there is a unique CDF $F$ s.t. $F(x) = P((-\infty, x])$ $\forall x \in \mathbb{R}$
That is, there is an equivalence between CDFs and Borel probability measures.

In sum, CDFs “parametrize” Borel probability measures!

(PP 1.R) References for Probability and Measure theory

Comment.

Excellent text – right level
Do the exercises

Reak Analysis. (Undergrad.)

Rudin’s Principles of Mathematical Analysis

Probability.

Jacod & Protter Probability Essentials (some of the proofs not very satisfying)
Durrett Probability Theory & Examples (a bit advanced side)
Grimmett & Stirzaker Probabilty & Random Processes

Reak Analysis. (Grad.)

Folland’s Real Analysis