Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
Authors: Yi Tian, Jian Qian, Suvrit Sra
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures. |
| Researcher Affiliation | Academia | Yi Tian* Department of EECS MIT Cambridge, MA 02139 yitian@mit.edu Jian Qian Department of EECS MIT Cambridge, MA 02139 jianqian@mit.edu Suvrit Sra Department of EECS MIT Cambridge, MA 02139 suvrit@mit.edu |
| Pseudocode | Yes | Algorithm 1 F-OVI (Factored Optimistic Value Iteration) and Algorithm 2 VI_Optimism (Value Iteration with Optimism) |
| Open Source Code | No | No explicit statement or link indicating the release of open-source code for the described methodology was found. |
| Open Datasets | No | This is a theoretical paper that does not use datasets for empirical evaluation, thus no information on publicly available training data is provided. |
| Dataset Splits | No | This is a theoretical paper that does not conduct experiments with data, so there are no specified training, validation, or test splits. |
| Hardware Specification | No | This is a theoretical paper focusing on algorithms and regret bounds, and it does not describe any experimental setup or hardware used for computations. |
| Software Dependencies | No | This is a theoretical paper, and as such, it does not detail specific software dependencies with version numbers that would be required to reproduce experiments. |
| Experiment Setup | No | This is a theoretical paper that focuses on algorithm design and proofs; therefore, it does not include details about an experimental setup or specific hyperparameters. |