IMED-RL: Regret optimal learning of ergodic Markov decision processes

Authors: Fabien Pesquerel, Odalric-Ambrym Maillard

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Last, we provide numerical illustrations on classical tabular MDPs, ergodic and communicating only, showing the competitiveness of IMED-RL in finite-time against state-of-the-art algorithms. In this section, we discuss the practical implementation and numerical aspects of IMED-RL and extend the discussion in Appendix F. Source code is available on github7. In different environments, we illustrate in Figure 2 and Figure 3 the performance of IMED-RL against the strategies UCRL3 Bourel et al. [2020], PSRL Osband et al. [2013] and Qlearning (run with discount γ = 0.99 and optimistic initialization).
Researcher Affiliation Academia Fabien Pesquerel* Odalric-Ambrym Maillard fabien.pesquerel@inria.fr odalric.maillard@inria.fr Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France
Pseudocode Yes Algorithm 1 IMED-RL: Indexed Minimum Empirical Divergence for Reinforcement Learning Require: State-Action space XM of MDP M, Assumptions 1, 2, 3 Require: Initial state s1 Sample at arg min a Ast Hs,a(t)
Open Source Code Yes Source code is available on github7. 7Plain text URL is https://github.com/fabienpesquerel/IMED-RL
Open Datasets No The paper describes experiments conducted on simulated environments like 'n-state river-swim environment', '2-room and 4-room' grid-worlds. These are problem definitions or environments, not specific publicly available datasets with associated links, DOIs, or formal citations for data access.
Dataset Splits No The paper discusses experiments on simulated MDP environments but does not specify train/validation/test dataset splits, percentages, or sample counts for the data generated or used within these environments.
Hardware Specification No The paper states, 'All experiments take less than an hour to run on a standard CPU.' This is a general statement and does not provide specific details about the CPU model, number of cores, memory, or other hardware specifications.
Software Dependencies No While the paper mentions the use of 'Q-learning' and comparisons to 'UCRL3' and 'PSRL', and states 'Source code is available on github7.', it does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used to conduct the experiments.
Experiment Setup Yes In the experiments, we consider an ergodic version of the classical n-state river-swim environment, 2-room and 4-room with ε = 10 3, and classical communicating versions (ε = 0). Q-learning (run with discount γ = 0.99 and optimistic initialization).