reproducibilityindex.ai

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Authors: Victor Boone, Zihan Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experimental illustrations To get a grasp of how PMEVI-DT behaves in practice, we provide in Fig. 2 a first round of illustrative experiments.
Researcher Affiliation	Academia	Victor Boone victor.boone@univ-grenoble-alpes.fr Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000 Grenoble, France Zihan Zhang zz5478@princeton.edu Princeton University
Pseudocode	Yes	Algorithm 1: PMEVI-DT(H , T, t 7 Mt) Algorithm 2: PMEVI(M, β, Γ, ϵ)
Open Source Code	Yes	The code is provided in the supplementary material, together with the scripts to reproduce the exact figures of the paper.
Open Datasets	Yes	In both, the environment is a river-swim which is a model known to be hard to learn despite its size, with high diameter and bias span, see Appendix D for the model s description.
Dataset Splits	No	The paper describes an RL environment ('river-swim') where data is generated through interaction, rather than using static datasets with predefined train/validation/test splits. Therefore, the concept of dataset splits as requested by the question does not directly apply.
Hardware Specification	No	The paper states that experiments 'took less than a hour on a low end laptop' but does not provide specific hardware details such as CPU/GPU models, memory, or processor types.
Software Dependencies	No	The paper mentions that the code is 'mostly written in Python' but does not specify a Python version or any specific library names with their version numbers.
Experiment Setup	No	The paper describes the environment and some high-level experimental conditions (e.g., river-swim size, use of prior knowledge), but does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings for the experiments.