Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Authors: Jiafan He, Dongruo Zhou, Quanquan Gu

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus. We show that UCBVI-γ achieves an e O SAT/(1 γ)1.5 regret... In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least eΩ SAT/(1 γ)1.5 . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs.
Researcher Affiliation Academia Jiafan He Department of Computer Science University of California, Los Angeles CA 90095, USA EMAIL Dongruo Zhou Department of Computer Science University of California, Los Angeles CA 90095, USA EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles CA 90095, USA EMAIL
Pseudocode Yes Algorithm 1 Upper Confidence Value-iteration UCBVI-γ
Open Source Code No The paper states "[N/A]" for including code/data/instructions needed to reproduce main experimental results and does not provide any other information about source code availability.
Open Datasets No The paper states "[N/A]" for questions related to data and experiments, indicating that no datasets were used in an experimental setting.
Dataset Splits No The paper states "[N/A]" for questions related to data and experiments, indicating that no dataset splits for training/validation were used.
Hardware Specification No The paper states "[N/A]" for questions related to compute resources, and no hardware specifications are mentioned as it is a theoretical work.
Software Dependencies No The paper states "[N/A]" for questions related to training details and compute resources, and does not mention any specific software dependencies or versions.
Experiment Setup No The paper focuses on theoretical analysis and proofs, and explicitly states "[N/A]" for questions related to training details and experimental setup.