Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning
Authors: Dilip Arumugam, Benjamin Van Roy
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity. ... 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Researcher Affiliation | Academia | Dilip Arumugam Department of Computer Science Stanford University dilip@cs.stanford.edu Benjamin Van Roy Department of Electrical Engineering Department of Management Science & Engineering Stanford University bvr@stanford.edu |
| Pseudocode | Yes | Algorithm 1 Posterior Sampling for Reinforcement Learning (PSRL) [152] ... Algorithm 2 Value-equivalent Sampling for Reinforcement Learning (VSRL) |
| Open Source Code | No | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Open Datasets | No | The paper is theoretical and does not describe experiments using datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe the hardware used for experiments as it did not run any. |
| Software Dependencies | No | The paper is theoretical and does not list software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide details on experimental setup or hyperparameters. |