IMED-RL: Regret optimal learning of ergodic Markov decision processes
Authors: Fabien Pesquerel, Odalric-Ambrym Maillard
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Last, we provide numerical illustrations on classical tabular MDPs, ergodic and communicating only, showing the competitiveness of IMED-RL in finite-time against state-of-the-art algorithms. In this section, we discuss the practical implementation and numerical aspects of IMED-RL and extend the discussion in Appendix F. Source code is available on github7. In different environments, we illustrate in Figure 2 and Figure 3 the performance of IMED-RL against the strategies UCRL3 Bourel et al. [2020], PSRL Osband et al. [2013] and Qlearning (run with discount γ = 0.99 and optimistic initialization). |
| Researcher Affiliation | Academia | Fabien Pesquerel* Odalric-Ambrym Maillard fabien.pesquerel@inria.fr odalric.maillard@inria.fr Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France |
| Pseudocode | Yes | Algorithm 1 IMED-RL: Indexed Minimum Empirical Divergence for Reinforcement Learning Require: State-Action space XM of MDP M, Assumptions 1, 2, 3 Require: Initial state s1 Sample at arg min a Ast Hs,a(t) |
| Open Source Code | Yes | Source code is available on github7. 7Plain text URL is https://github.com/fabienpesquerel/IMED-RL |
| Open Datasets | No | The paper describes experiments conducted on simulated environments like 'n-state river-swim environment', '2-room and 4-room' grid-worlds. These are problem definitions or environments, not specific publicly available datasets with associated links, DOIs, or formal citations for data access. |
| Dataset Splits | No | The paper discusses experiments on simulated MDP environments but does not specify train/validation/test dataset splits, percentages, or sample counts for the data generated or used within these environments. |
| Hardware Specification | No | The paper states, 'All experiments take less than an hour to run on a standard CPU.' This is a general statement and does not provide specific details about the CPU model, number of cores, memory, or other hardware specifications. |
| Software Dependencies | No | While the paper mentions the use of 'Q-learning' and comparisons to 'UCRL3' and 'PSRL', and states 'Source code is available on github7.', it does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used to conduct the experiments. |
| Experiment Setup | Yes | In the experiments, we consider an ergodic version of the classical n-state river-swim environment, 2-room and 4-room with ε = 10 3, and classical communicating versions (ε = 0). Q-learning (run with discount γ = 0.99 and optimistic initialization). |