Retour

Reinforcement learning

ECTS : 6

Volume horaire : 24

Description du contenu de l'enseignement :

 Outline:

Organization of lectures in two parts (with 15min break):

Compétence à acquérir :

Reinforcement Learning (RL) refers to scenarios where the learning algorithm operates in closed-loop, simultaneously using past data to adjust its decisions and taking actions that will influence future observations. RL algorithms combine ideas from control, machine learning, statistics, and operations research. A common tread of all RL algorithms is the need to balance exploration (trying knew things) and exploitation (choosing the most successful actions so far).  This course will introduce the main models (multi-armed bandits and Markov decision processes) and key ideas for algorithm design (e.g. model-based vs. model-free RL, value based vs. policy based algorithms, on-policy vs. off-policy learning, function approximation). 

Mode de contrôle des connaissances :

Homework assignments and project 

Bibliographie, lectures recommandées :

Books MDPs:

Books RL:  Bandit algorithms:  Implementation: 

Document susceptible de mise à jour - 01/04/2026
Université Paris Dauphine - PSL - Place du Maréchal de Lattre de Tassigny - 75775 PARIS Cedex 16