Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
Authors: Jiafan He, Dongruo Zhou, Quanquan Gu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus. We show that UCBVI-γ achieves an e O SAT/(1 γ)1.5 regret... In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least eΩ SAT/(1 γ)1.5 . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs. |
| Researcher Affiliation | Academia | Jiafan He Department of Computer Science University of California, Los Angeles CA 90095, USA EMAIL Dongruo Zhou Department of Computer Science University of California, Los Angeles CA 90095, USA EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles CA 90095, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Upper Confidence Value-iteration UCBVI-γ |
| Open Source Code | No | The paper states "[N/A]" for including code/data/instructions needed to reproduce main experimental results and does not provide any other information about source code availability. |
| Open Datasets | No | The paper states "[N/A]" for questions related to data and experiments, indicating that no datasets were used in an experimental setting. |
| Dataset Splits | No | The paper states "[N/A]" for questions related to data and experiments, indicating that no dataset splits for training/validation were used. |
| Hardware Specification | No | The paper states "[N/A]" for questions related to compute resources, and no hardware specifications are mentioned as it is a theoretical work. |
| Software Dependencies | No | The paper states "[N/A]" for questions related to training details and compute resources, and does not mention any specific software dependencies or versions. |
| Experiment Setup | No | The paper focuses on theoretical analysis and proofs, and explicitly states "[N/A]" for questions related to training details and experimental setup. |