Notes on Machine Learning 1: What is machine learning?
by 장승환
(ML 1.1) What is machine learning?
“Designing algorithms for inferring what is unknown from knowns.”
MM considers ML as a subfield of statistics, with emphasis on algorithms.
Got to read an interesting article Machine Learning vs. Statistics, thanks to Whi Kwon.
Applications
- Spam (filtering out)
- Handwriting (recognition)
- Google streetview
- Netflix (recommendation systems)
- Navigation
- Climate modelling
(ML 1.2) What is supervised learning?
Classification of ML problems: Supervised vs. Unsupervised
Supervised: Given $(x^{(1)}, y^{(1)}), \ldots, (x^{(n)}, y^{(n)})$ choose a function $f(x) = y$.
- Classification: $y^{(i)} \in \{$finite set$\}$
- Regression: $y^{(i)} \in \mathbb{R}$ or $y^{(i)} \in \mathbb{R}^d$
- $x^{(i)}$ : data point
- $y^{(i)}$ : class/value/label
(ML 1.3) What is unsupervised learning?
Much less well-defined.
Unsupervised: Given $x^{(1)}, \ldots, x^{(n)}$, find patterns in the data.
- Clustering (typical UL)
- Density estimation (much more well-defined)
- Dimensionality reduction
- Feature leraning
- many more
(ML 1.4) Variations on supervised and unsupervised
- Semi-supervised
- Active Learning
- Decision theory
- Reinforcement Learning
(ML 1.5) Generative vs discriminative models
Given data $(x^{(1)}, y^{(1)}), \ldots, (x^{(1)}, y^{(1)})$.
Denote $(x, y)$ by a prototypical (data point, label) pair.
Discriminative: model $p(y\vert x)$
Generative: model the joint distribution
Some good reasons to use discriminative model: Statistically, it’s very hard to estimate either $f(x\vert y)$ or $f(x)$ because it take a lotof data. (You’re inclined to make mistakes.)
Generative process:
(ML 1.6) $k$-Nearest Neighbor (kNN) classification algorithm
Given data $D = ((x_1, y_1), \ldots, (x_n, y_n))$, $x_i \in \mathbb{R}, y_i \in \{0, 1\}$.
Given a new data point $x$.
Classify by deciding on what is $y$ corresponding to $x$ according to the majority vote from the $k$-nearest points in the training data.
(Nearest in terms of a pre-determined metric.)
Probabilistic formulation (Discrimitive model!) Fix $D$, $x$, $k$.
Find a random variable $Y \sim p$ where $p(y) =$ #$\{x_i \in N_k(x) : y_i = y \}/k$
Sometimes people write $p(y\vert x, D)$ for $p(y)$, even though it’s not really a conditional probability.
The estimate (or prediction) is given by
How does one choose $k$?
$\leadsto$
Important problem of choosing parameters. (Cross-validation / Bias-variance trade-off)
Subscribe via RSS