Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
Authors: Zihan Zhang, Yuhang Jiang, Yuan Zhou, Xiangyang Ji
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. [...] We design a computational efficient algorithm to achieve near-optimal regret of Op SAH3K lnp1{δqq5 in K episodes using O p H log2 log2p Kqq batches with confidence parameter δ. [...] Our technical contribution are two-fold: 1) a near-optimal design scheme to explore over the unlearned states; 2) an computational efficient algorithm to explore certain directions with an approximated transition model. |
| Researcher Affiliation | Academia | Department of Automation, Tsinghua University, EMAIL :Department of Automation, Tsinghua University, EMAIL ;Yau Mathematical Sciences Center & Department of Mathematical Sciences, Tsinghua University, EMAIL Department of Automation, Tsinghua University, EMAIL |
| Pseudocode | Yes | Algorithm 1 Main Algorithm, Algorithm 2 Raw Exploration, Algorithm 3 Policy Elimination |
| Open Source Code | No | The paper does not contain any statements about releasing source code, a link to a code repository, or information about code in supplementary materials. |
| Open Datasets | No | The paper is theoretical and does not involve empirical training on specific datasets. |
| Dataset Splits | No | As the paper is theoretical and does not involve empirical experiments, it does not mention validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe the specific hardware used for any experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details on experimental setup such as hyperparameters or training configurations. |