Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Authors: Dilip Arumugam, Benjamin Van Roy

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity. ... 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Researcher Affiliation Academia Dilip Arumugam Department of Computer Science Stanford University dilip@cs.stanford.edu Benjamin Van Roy Department of Electrical Engineering Department of Management Science & Engineering Stanford University bvr@stanford.edu
Pseudocode Yes Algorithm 1 Posterior Sampling for Reinforcement Learning (PSRL) [152] ... Algorithm 2 Value-equivalent Sampling for Reinforcement Learning (VSRL)
Open Source Code No 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Open Datasets No The paper is theoretical and does not describe experiments using datasets.
Dataset Splits No The paper is theoretical and does not describe dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe the hardware used for experiments as it did not run any.
Software Dependencies No The paper is theoretical and does not list software dependencies with specific version numbers.
Experiment Setup No The paper is theoretical and does not provide details on experimental setup or hyperparameters.