Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Role of Coverage in Online Reinforcement Learning
Authors: Tengyang Xie, Dylan J Foster, Yu Bai, Nan Jiang, Sham M. Kakade
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While our results primarily concern analysis of existing algorithms rather than algorithm design, they highlight a number of exciting directions for future research, and we are optimistic that the notion of coverability can guide the design of practical algorithms going forward. |
| Researcher Affiliation | Collaboration | Tengyang Xie UIUC EMAIL Dylan J. Foster Microsoft Research EMAIL Yu Bai Salesforce Research EMAIL Nan Jiang UIUC EMAIL Sham M. Kakade Harvard University EMAIL |
| Pseudocode | Yes | Algorithm 1 GOLF (Jin et al., 2021a) input: Function class F, confidence width β >0. [...] Algorithm 2 Reward-Free Exploration with GOLF [...] Algorithm 3 Offline GOLF with Exploration Data and Target Reward |
| Open Source Code | No | The paper does not provide any statement about making its code open-source or provide a link to a code repository. |
| Open Datasets | No | As a theoretical paper focused on analysis and proofs, it does not conduct experiments on datasets, thus no training data is mentioned. |
| Dataset Splits | No | As a theoretical paper focused on analysis and proofs, it does not describe experimental validation on data, thus no dataset splits are provided. |
| Hardware Specification | No | As a theoretical paper, it does not describe any experimental setup or the hardware used for computations. |
| Software Dependencies | No | As a theoretical paper, it does not describe any experimental setup or specific software dependencies with version numbers. |
| Experiment Setup | No | As a theoretical paper, it does not describe an experimental setup with hyperparameters or training settings. |