Introduction to Reinforcement Learning. Multi-armed bandits. Contextual Bandits. Finite Markov Decision Process. Dynamic Programming. Policy Iteration. Value Iteration. Monte Carlo Methods. Temporal Difference Learning. n-step bootstrapping. On-policy prediction with function approximation. on-policy control with function approximation. off-policy control with function approximation. Policy Gradient Methods. REINFORCE. Actor-Critic. Determistic Policy Gradients. Natural Policy Gradient. TRPO and PPO. Model-based RL. Planning. Eligibility Traces. Hierarchical RL. POMDPs. inverse-RL. Exploration in RL. Off-line RL. Multi-agent RL.
- Responsable du site: Sarath Chandar Anbil Parthipan