reproducibilityindex.ai

Proper Value Equivalence

Authors: Christopher Grimm, Andre Barreto, Greg Farquhar, David Silver, Satinder Singh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst provide results from tabular experiments on a stochastic version of the Four Rooms domain which serve to corroborate our theoretical claims. Then, we present results from experiments across the full Atari 57 benchmark [3] showcasing that the insights from studying PVE and its relationship to Mu Zero can provide a beneﬁt in practice at scale.
Researcher Affiliation	Collaboration	Christopher Grimm Computer Science & Engineering University of Michigan crgrimm@umich.edu Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh Deep Mind {andrebarreto,gregfar, davidsilver,baveja}@google.com
Pseudocode	No	The paper describes algorithms and derivations in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code for the illustrative experiments is available at a URL provided in Appendix A.3.
Open Datasets	Yes	We use the standard OpenAI Gym wrapper for Atari environments [3].
Dataset Splits	No	The paper mentions using the Atari 57 benchmark and states that hyperparameters were not explicitly tuned but rather used default values from a previous Mu Zero paper, implying standard splits. However, it does not explicitly state the train/validation/test dataset splits within the provided text.
Hardware Specification	Yes	We run our experiments on a custom internal cluster using Nvidia V100 GPUs.
Software Dependencies	No	The paper mentions using the "OpenAI Gym wrapper for Atari environments [3]" but does not specify any version numbers for this or other software dependencies.
Experiment Setup	Yes	All experiments ran on 64 actors/1 learner on a custom internal cluster for 500 million frames with a batch size of 2048. For the Four Rooms domain experiments we use a learning rate of 1e-4 and a discount factor of 0.99 for all agents. The model updates are performed via Adam with epsilon 1e-3, beta1 0.9, and beta2 0.999. Atari experiments used a learning rate of 2e-4, Adam with epsilon 1e-3, beta1 0.9, and beta2 0.999. The agent was trained for 500M frames for all games.