Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Authors: Jiafan He, Dongruo Zhou, Quanquan Gu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus. We show that UCBVI-γ achieves an e O SAT/(1 γ)1.5 regret... In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least eΩ SAT/(1 γ)1.5 . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs.
Researcher Affiliation Academia Jiafan He Department of Computer Science University of California, Los Angeles CA 90095, USA jiafanhe19@ucla.edu Dongruo Zhou Department of Computer Science University of California, Los Angeles CA 90095, USA drzhou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles CA 90095, USA qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Upper Confidence Value-iteration UCBVI-γ
Open Source Code No The paper states "[N/A]" for including code/data/instructions needed to reproduce main experimental results and does not provide any other information about source code availability.
Open Datasets No The paper states "[N/A]" for questions related to data and experiments, indicating that no datasets were used in an experimental setting.
Dataset Splits No The paper states "[N/A]" for questions related to data and experiments, indicating that no dataset splits for training/validation were used.
Hardware Specification No The paper states "[N/A]" for questions related to compute resources, and no hardware specifications are mentioned as it is a theoretical work.
Software Dependencies No The paper states "[N/A]" for questions related to training details and compute resources, and does not mention any specific software dependencies or versions.
Experiment Setup No The paper focuses on theoretical analysis and proofs, and explicitly states "[N/A]" for questions related to training details and experimental setup.