Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Authors: Victor Boone, Zihan Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental illustrations To get a grasp of how PMEVI-DT behaves in practice, we provide in Fig. 2 a first round of illustrative experiments.
Researcher Affiliation Academia Victor Boone victor.boone@univ-grenoble-alpes.fr Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000 Grenoble, France Zihan Zhang zz5478@princeton.edu Princeton University
Pseudocode Yes Algorithm 1: PMEVI-DT(H , T, t 7 Mt) Algorithm 2: PMEVI(M, β, Γ, ϵ)
Open Source Code Yes The code is provided in the supplementary material, together with the scripts to reproduce the exact figures of the paper.
Open Datasets Yes In both, the environment is a river-swim which is a model known to be hard to learn despite its size, with high diameter and bias span, see Appendix D for the model s description.
Dataset Splits No The paper describes an RL environment ('river-swim') where data is generated through interaction, rather than using static datasets with predefined train/validation/test splits. Therefore, the concept of dataset splits as requested by the question does not directly apply.
Hardware Specification No The paper states that experiments 'took less than a hour on a low end laptop' but does not provide specific hardware details such as CPU/GPU models, memory, or processor types.
Software Dependencies No The paper mentions that the code is 'mostly written in Python' but does not specify a Python version or any specific library names with their version numbers.
Experiment Setup No The paper describes the environment and some high-level experimental conditions (e.g., river-swim size, use of prior knowledge), but does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings for the experiments.