Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning
Authors: Junyan Liu, Yunfan Li, Ruosong Wang, Lin Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper is theoretically oriented and does not conduct any experiment. |
| Researcher Affiliation | Academia | Junyan Liu University of Washington junyanl1@cs.washington.edu Yunfan Li University of California, Los Angeles yunfanli@g.ucla.edu Ruosong Wang CFCS and School of Computer Science Peking University ruosongwang@pku.edu.cn Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Elimination framework for ULI Algorithm 2 PE with adaptive barycentric spanner Algorithm 3 Tabular Episodic MDPs with ULI guarantee Algorithm 4 Uniform estimation for value functions Algorithm 5 Construct estimated value function |
| Open Source Code | No | The paper does not provide an explicit statement about open-source code release for the methodology described, nor does it provide a specific repository link. The NeurIPS checklist indicates "NA" for code access, stating "This paper is theoretically oriented and does not conduct any experiment.". |
| Open Datasets | No | The paper is theoretical and does not perform experiments with datasets, thus no training dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and does not perform experiments, thus no dataset split information for training, validation, or testing is provided. |
| Hardware Specification | No | The paper is theoretical and does not involve computational experiments, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe computational experiments or their software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include an experimental setup with specific hyperparameters or training configurations. |