Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Authors: Yi Tian, Jian Qian, Suvrit Sra

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.
Researcher Affiliation Academia Yi Tian* Department of EECS MIT Cambridge, MA 02139 yitian@mit.edu Jian Qian Department of EECS MIT Cambridge, MA 02139 jianqian@mit.edu Suvrit Sra Department of EECS MIT Cambridge, MA 02139 suvrit@mit.edu
Pseudocode Yes Algorithm 1 F-OVI (Factored Optimistic Value Iteration) and Algorithm 2 VI_Optimism (Value Iteration with Optimism)
Open Source Code No No explicit statement or link indicating the release of open-source code for the described methodology was found.
Open Datasets No This is a theoretical paper that does not use datasets for empirical evaluation, thus no information on publicly available training data is provided.
Dataset Splits No This is a theoretical paper that does not conduct experiments with data, so there are no specified training, validation, or test splits.
Hardware Specification No This is a theoretical paper focusing on algorithms and regret bounds, and it does not describe any experimental setup or hardware used for computations.
Software Dependencies No This is a theoretical paper, and as such, it does not detail specific software dependencies with version numbers that would be required to reproduce experiments.
Experiment Setup No This is a theoretical paper that focuses on algorithm design and proofs; therefore, it does not include details about an experimental setup or specific hyperparameters.