reproducibilityindex.ai

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Authors: Yi Tian, Jian Qian, Suvrit Sra

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The ﬁrst one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difﬁculty hiding in the intricacy of the structures.
Researcher Affiliation	Academia	Yi Tian* Department of EECS MIT Cambridge, MA 02139 yitian@mit.edu Jian Qian Department of EECS MIT Cambridge, MA 02139 jianqian@mit.edu Suvrit Sra Department of EECS MIT Cambridge, MA 02139 suvrit@mit.edu
Pseudocode	Yes	Algorithm 1 F-OVI (Factored Optimistic Value Iteration) and Algorithm 2 VI_Optimism (Value Iteration with Optimism)
Open Source Code	No	No explicit statement or link indicating the release of open-source code for the described methodology was found.
Open Datasets	No	This is a theoretical paper that does not use datasets for empirical evaluation, thus no information on publicly available training data is provided.
Dataset Splits	No	This is a theoretical paper that does not conduct experiments with data, so there are no specified training, validation, or test splits.
Hardware Specification	No	This is a theoretical paper focusing on algorithms and regret bounds, and it does not describe any experimental setup or hardware used for computations.
Software Dependencies	No	This is a theoretical paper, and as such, it does not detail specific software dependencies with version numbers that would be required to reproduce experiments.
Experiment Setup	No	This is a theoretical paper that focuses on algorithm design and proofs; therefore, it does not include details about an experimental setup or specific hyperparameters.