reproducibilityindex.ai

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Authors: Johan Samir Obando Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we demonstrate that incorporating Mixture-of-Expert (Mo E) modules, and in particular Soft Mo Es (Puigcerver et al., 2023), into valuebased networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Researcher Affiliation	Collaboration	1Google DeepMind 2Mila Quebec AI Institute 3Universite de Montreal 4University of Oxford 5McGill University.
Pseudocode	No	The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	We make our code publicly available.
Open Datasets	Yes	As in the original papers, we evaluate on 20 games from the Arcade Learning Environment (ALE), a collection of a diverse and challenging pixel-based environments (Bellemare et al., 2013b).
Dataset Splits	No	The paper describes how experiments were run (e.g., '5 independent seeds', '95% stratiﬁed bootstrap conﬁdence intervals') and reports interquantile mean, but it does not specify any training/validation/test dataset splits in terms of data partitioning for model validation.
Hardware Specification	Yes	All experiments were run on NVIDIA Tesla P100 GPUs
Software Dependencies	No	The paper mentions 'Dopamine library' and lists general Python tools like 'NumPy', 'Matplotlib', 'Jupyter', 'Pandas', and 'JAX' in the acknowledgements. However, it does not provide specific version numbers for these software components, which are required for reproducibility.
Experiment Setup	Yes	Appendix B, titled 'Hyper-parameters list', provides detailed tables (Table 1, 2, 3, 4) of default hyper-parameter settings for various agents (DER, Dr Q(ϵ), DQN, Rainbow, CQL, CQL+C51) and neural network architectures. These tables specify values for batch size, learning rate, discount factor, replay capacity, activation functions, network dimensions, and other configuration details.