reproducibilityindex.ai

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Authors: Jiafan He, Dongruo Zhou, Quanquan Gu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus. We show that UCBVI-γ achieves an e O SAT/(1 γ)1.5 regret... In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least eΩ SAT/(1 γ)1.5 . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs.
Researcher Affiliation	Academia	Jiafan He Department of Computer Science University of California, Los Angeles CA 90095, USA jiafanhe19@ucla.edu Dongruo Zhou Department of Computer Science University of California, Los Angeles CA 90095, USA drzhou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles CA 90095, USA qgu@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Upper Conﬁdence Value-iteration UCBVI-γ
Open Source Code	No	The paper states "[N/A]" for including code/data/instructions needed to reproduce main experimental results and does not provide any other information about source code availability.
Open Datasets	No	The paper states "[N/A]" for questions related to data and experiments, indicating that no datasets were used in an experimental setting.
Dataset Splits	No	The paper states "[N/A]" for questions related to data and experiments, indicating that no dataset splits for training/validation were used.
Hardware Specification	No	The paper states "[N/A]" for questions related to compute resources, and no hardware specifications are mentioned as it is a theoretical work.
Software Dependencies	No	The paper states "[N/A]" for questions related to training details and compute resources, and does not mention any specific software dependencies or versions.
Experiment Setup	No	The paper focuses on theoretical analysis and proofs, and explicitly states "[N/A]" for questions related to training details and experimental setup.