reproducibilityindex.ai

Self-Consistent Models and Values

Authors: Greg Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado P. van Hasselt, David Silver

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate self-consistency both in a tabular setting and at scale, in the context of deep RL.
Researcher Affiliation	Collaboration	Gregory Farquhar Deep Mind Kate Baumli Deep Mind Zita Marinho Deep Mind Angelos Filos University of Oxford Matteo Hessel Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind
Pseudocode	Yes	Algorithm 1: Model-based RL with joint grounded and self-consistency updates.
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We do not provide code, but the experimental setup is described in detail in the supplemental material.
Open Datasets	Yes	In our ﬁrst set of experiments, we used random Garnet MDPs [2] to study different combinations of grounded and self-consistent updates for approximate models and values.
Dataset Splits	No	The paper states, "Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please refer to the supplemental material." indicating that such details, including dataset splits, are not in the main body of the paper.
Hardware Specification	No	The paper states, "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please refer to the supplemental material." This indicates that specific hardware details are not provided in the main text.
Software Dependencies	No	The paper mentions software like Jax [9] and the Deep Mind Jax ecosystem [3], but it does not provide specific version numbers for these or any other software dependencies required to replicate the experiments.
Experiment Setup	No	The paper states that "The only difference is in the hyperparameters for batch size and replay buffer, as documented in the Appendix" and the reproducibility checklist confirms that training details like hyperparameters are referred to the supplemental material, meaning they are not specified in the main text.