Self-Consistent Models and Values
Authors: Greg Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado P. van Hasselt, David Silver
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate self-consistency both in a tabular setting and at scale, in the context of deep RL. |
| Researcher Affiliation | Collaboration | Gregory Farquhar Deep Mind Kate Baumli Deep Mind Zita Marinho Deep Mind Angelos Filos University of Oxford Matteo Hessel Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind |
| Pseudocode | Yes | Algorithm 1: Model-based RL with joint grounded and self-consistency updates. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We do not provide code, but the experimental setup is described in detail in the supplemental material. |
| Open Datasets | Yes | In our first set of experiments, we used random Garnet MDPs [2] to study different combinations of grounded and self-consistent updates for approximate models and values. |
| Dataset Splits | No | The paper states, "Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please refer to the supplemental material." indicating that such details, including dataset splits, are not in the main body of the paper. |
| Hardware Specification | No | The paper states, "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please refer to the supplemental material." This indicates that specific hardware details are not provided in the main text. |
| Software Dependencies | No | The paper mentions software like Jax [9] and the Deep Mind Jax ecosystem [3], but it does not provide specific version numbers for these or any other software dependencies required to replicate the experiments. |
| Experiment Setup | No | The paper states that "The only difference is in the hyperparameters for batch size and replay buffer, as documented in the Appendix" and the reproducibility checklist confirms that training details like hyperparameters are referred to the supplemental material, meaning they are not specified in the main text. |