Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Proper Value Equivalence
Authors: Christopher Grimm, Andre Barreto, Greg Farquhar, David Silver, Satinder Singh
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first provide results from tabular experiments on a stochastic version of the Four Rooms domain which serve to corroborate our theoretical claims. Then, we present results from experiments across the full Atari 57 benchmark [3] showcasing that the insights from studying PVE and its relationship to Mu Zero can provide a benefit in practice at scale. |
| Researcher Affiliation | Collaboration | Christopher Grimm Computer Science & Engineering University of Michigan EMAIL Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh Deep Mind EMAIL |
| Pseudocode | No | The paper describes algorithms and derivations in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for the illustrative experiments is available at a URL provided in Appendix A.3. |
| Open Datasets | Yes | We use the standard OpenAI Gym wrapper for Atari environments [3]. |
| Dataset Splits | No | The paper mentions using the Atari 57 benchmark and states that hyperparameters were not explicitly tuned but rather used default values from a previous Mu Zero paper, implying standard splits. However, it does not explicitly state the train/validation/test dataset splits within the provided text. |
| Hardware Specification | Yes | We run our experiments on a custom internal cluster using Nvidia V100 GPUs. |
| Software Dependencies | No | The paper mentions using the "OpenAI Gym wrapper for Atari environments [3]" but does not specify any version numbers for this or other software dependencies. |
| Experiment Setup | Yes | All experiments ran on 64 actors/1 learner on a custom internal cluster for 500 million frames with a batch size of 2048. For the Four Rooms domain experiments we use a learning rate of 1e-4 and a discount factor of 0.99 for all agents. The model updates are performed via Adam with epsilon 1e-3, beta1 0.9, and beta2 0.999. Atari experiments used a learning rate of 2e-4, Adam with epsilon 1e-3, beta1 0.9, and beta2 0.999. The agent was trained for 500M frames for all games. |