Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
Authors: Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | SEED adopts two state of the art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, Deep Mind Lab and Google Research Football. |
| Researcher Affiliation | Industry | Brain Team Google Research EMAIL |
| Pseudocode | No | The paper includes architectural diagrams but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out. Github: http://github.com/google-research/seed_rl. |
| Open Datasets | Yes | We evaluate SEED on a number of environments: Deep Mind Lab (Beattie et al., 2016), Google Research Football (Kurach et al., 2019) and Arcade Learning Environment (Bellemare et al., 2013). |
| Dataset Splits | No | The paper describes hyperparameter tuning over sets of hyperparameters and evaluation procedures, but it does not specify a training/validation/test dataset split in the conventional sense for static datasets, which is typical for reinforcement learning environments. |
| Hardware Specification | Yes | For evaluating performance, we compare IMPALA using an Nvidia P100 with SEED with multiple accelerator setups. |
| Software Dependencies | Yes | For optimal performance, we use TPUs (cloud.google.com/tpu/) and Tensor Flow 2 (Abadi et al., 2015) to simplify the implementation. |
| Experiment Setup | Yes | The same set of 24 hyperparameter sets and the same model (Res Net from IMPALA) was used for both agents. More details can be found in Appendix A.1.2. |