Notes on Machine Learning 9: Linear regression
by 장승환
(ML 9.1) Linear regression - Nonlinearity via basis functions
“It’s truly a workhorse of statistics!”
“It’s not just about lines & planes!”
Setup. Given $D = ((x_1, y_1), \ldots, (x_n, y_n))$ with $x_i \in \mathbb{R}^d$ and $y_i \in \mathbb{R}$.
Goal. Select “good” $f : \mathbb{R} \rightarrow \mathbb{R}$ for predicting $y$ for new $x$.
Basis functions.
The simplest class of functions $\mathbb{R}^d \rightarrow \mathbb{R}$ one can think of is linear ones. Those are precisely the functions $\mathbb{R}^d \rightarrow \mathbb{R}$ that is given by
for some (fixed) vector $w \in \mathbb{R}^d.$
(ML 9.2) Linear regression - Definition & Motivation
Discrminative approach.
Instead of aiming to model the target function $f : \mathbb{R}^d \rightarrow \mathbb{R}$, we take probabilistic approach by modelling the conditional distribution $p(y\vert x)$. We start with a parametized family $p_\theta(y\vert x)$ whith $\theta \in \Theta$ and estimate $\theta$ using the data $D = ((x_1, y_1), \ldots, (x_n, y_n))$. In other words, we figure out what $theta$ the datat comes from.
But what parametrized family should we choose?
One natural choice would be a Gaussian family: We set $p_\theta(y\vert x) = N(y\vert \mu(x), \sigma^2(x))$, meaning that for fixed $x$, the random variable $Y$ corresponding to $x$ is such that $Y \sim N(\mu(x), \sigma^2(x))$. What remains is to decide the dpendency of $\mu(x)$ and $\sigma^2(x)$ upon $\theta$.
We set the parameter to be $\theta = (w, \sigma^2)$ with $w \in \mathbb{R}, \sigma^2 >0$ and take $\mu(x) = w^Tx, \sigma^2(x) = \sigma^2$, so that we have
This is called the (Gaussian) linear regression.
In effect, we have modelled the target function $f$ as the random variable
wherer $\varepsilon \sim N(0, \sigma^2)$. Thus, the term “linear.”
(ML 9.3) Choosing f under linear regression
Why linear regression is such a natural model for regresession in some sense.
In other words, why does it make sense to take $\mu(x) = w^Tx$?
For th square loss $L(y, \hat{Y}) = (y-y)^2$, we have
minimizes the expectred loss. (cf. Ch.3 Decision theory - 3.4 Square loss)
(ML 9.4)(ML 9.5)(ML 9.6) MLE for linear regression
(ML 9.7) Basis functions MLE
How to model nonlinearity using linear regression.
Subscribe via RSS