reproducibilityindex.ai

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Authors: Dilip Arumugam, Benjamin Van Roy

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity. ... 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Researcher Affiliation	Academia	Dilip Arumugam Department of Computer Science Stanford University dilip@cs.stanford.edu Benjamin Van Roy Department of Electrical Engineering Department of Management Science & Engineering Stanford University bvr@stanford.edu
Pseudocode	Yes	Algorithm 1 Posterior Sampling for Reinforcement Learning (PSRL) [152] ... Algorithm 2 Value-equivalent Sampling for Reinforcement Learning (VSRL)
Open Source Code	No	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Open Datasets	No	The paper is theoretical and does not describe experiments using datasets.
Dataset Splits	No	The paper is theoretical and does not describe dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe the hardware used for experiments as it did not run any.
Software Dependencies	No	The paper is theoretical and does not list software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not provide details on experimental setup or hyperparameters.