Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Authors: Victor Boone, Zihan Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental illustrations To get a grasp of how PMEVI-DT behaves in practice, we provide in Fig. 2 a first round of illustrative experiments. |
| Researcher Affiliation | Academia | Victor Boone victor.boone@univ-grenoble-alpes.fr Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000 Grenoble, France Zihan Zhang zz5478@princeton.edu Princeton University |
| Pseudocode | Yes | Algorithm 1: PMEVI-DT(H , T, t 7 Mt) Algorithm 2: PMEVI(M, β, Γ, ϵ) |
| Open Source Code | Yes | The code is provided in the supplementary material, together with the scripts to reproduce the exact figures of the paper. |
| Open Datasets | Yes | In both, the environment is a river-swim which is a model known to be hard to learn despite its size, with high diameter and bias span, see Appendix D for the model s description. |
| Dataset Splits | No | The paper describes an RL environment ('river-swim') where data is generated through interaction, rather than using static datasets with predefined train/validation/test splits. Therefore, the concept of dataset splits as requested by the question does not directly apply. |
| Hardware Specification | No | The paper states that experiments 'took less than a hour on a low end laptop' but does not provide specific hardware details such as CPU/GPU models, memory, or processor types. |
| Software Dependencies | No | The paper mentions that the code is 'mostly written in Python' but does not specify a Python version or any specific library names with their version numbers. |
| Experiment Setup | No | The paper describes the environment and some high-level experimental conditions (e.g., river-swim size, use of prior knowledge), but does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings for the experiments. |