Self-Consistent Models and Values

Authors: Greg Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado P. van Hasselt, David Silver

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate self-consistency both in a tabular setting and at scale, in the context of deep RL.
Researcher Affiliation Collaboration Gregory Farquhar Deep Mind Kate Baumli Deep Mind Zita Marinho Deep Mind Angelos Filos University of Oxford Matteo Hessel Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind
Pseudocode Yes Algorithm 1: Model-based RL with joint grounded and self-consistency updates.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We do not provide code, but the experimental setup is described in detail in the supplemental material.
Open Datasets Yes In our first set of experiments, we used random Garnet MDPs [2] to study different combinations of grounded and self-consistent updates for approximate models and values.
Dataset Splits No The paper states, "Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please refer to the supplemental material." indicating that such details, including dataset splits, are not in the main body of the paper.
Hardware Specification No The paper states, "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please refer to the supplemental material." This indicates that specific hardware details are not provided in the main text.
Software Dependencies No The paper mentions software like Jax [9] and the Deep Mind Jax ecosystem [3], but it does not provide specific version numbers for these or any other software dependencies required to replicate the experiments.
Experiment Setup No The paper states that "The only difference is in the hyperparameters for batch size and replay buffer, as documented in the Appendix" and the reproducibility checklist confirms that training details like hyperparameters are referred to the supplemental material, meaning they are not specified in the main text.