Notes on Deep Reinforcement Learning
by 장승환
In this page I summarize in a succinct and straighforward fashion what I learn from Deep Reinforcement Learning course by Sergey Levine, along with my own thoughts and related resources. I will update this page frequently, like every week, until it’s complete.
Acronyms
- RL: Reinforcement Learning
- DRL Deep Reinfocement Learning
(8/23) Introduction to Reinforcement Learning
Markov chain
$\mathscr{M} = (\mathscr{S}, \mathscr{T})$
- $\mathscr{S}$ (state space) / $s \in \mathscr{S}$ (state)
- $\mathscr{T} (“= p”): \mathscr{S} \times \mathscr{S} \rightarrow [0,1]$ (transition operator, linear)
If we set $v_t = (p[S_t = i])_{i \in \mathscr{S}} \in [0,1]^{\vert \mathscr{S}\vert}$, then $v_{t+1} = \mathscr{T} v_t$.
Markov Decision Process
Extension of MC to the decision making setting (popularized by Bellman in 1950’s)
$\mathscr{M} = (\mathscr{S}, \mathscr{A}, \mathscr{T}, r)$
- $\mathscr{S}$ (state space) / $s \in \mathscr{S}$ (state)
- $\mathscr{A}$ (action space) / $a \in \mathscr{A}$ (action)
- $\mathscr{T} : \mathscr{S} \times \mathscr{A} \times \mathscr{S} \rightarrow [0,1]$ (transition operator, tensor!)
If we set $v_t = (p[S_t = j])_{j \in \mathscr{S}} \in [0,1]^{\vert \mathscr{S}\vert}$ and $v_t = (p[S_t = k])_{k \in \mathscr{S}} \in [0,1]^{\vert \mathscr{S}\vert}$, then
To be added..
Subscribe via RSS