On Gap-dependent Bounds for Offline Reinforcement Learning
Authors: Xinqi Wang, Qiwen Cui, Simon S. Du
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. ... Lastly, we present nearly-matching lower bounds to complement our gap-dependent upper bounds. ... 1.1 Main Contributions We present novel analyses for the standard VI-LCB algorithm (Algorithm 2). ... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Researcher Affiliation | Academia | Xinqi Wang Institute for Interdisciplinary Information Sciences Tsinghua University wangxqkaxdd@gmail.com Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1: VI-LCB ... Algorithm 2: Subsampled VI-LCB |
| Open Source Code | No | (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Open Datasets | No | The paper is theoretical and does not involve empirical experiments, dataset usage, or training. The ethics checklist states: "(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]" |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments or dataset splits for validation. No mention of validation splits. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments. The ethics checklist states: "(d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]" |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments. Therefore, it does not list software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup, hyperparameters, or training settings for empirical runs. |