Introduction to Reinforcement Learning. Multi-armed bandits. Contextual Bandits. Finite Markov Decision Process. Dynamic Programming. Policy Iteration. Value Iteration. Monte Carlo Methods. Temporal Difference Learning. n-step bootstrapping. On-policy prediction with function approximation. on-policy control with function approximation. off-policy control with function approximation. Policy Gradient Methods. Model-based RL. Planning. Eligibility Traces. Hierarchical RL. POMDPs. inverse-RL. Exploration in RL. Offline RL.
- Responsable du site: Sarath Chandar Anbil Parthipan