Mixtures of Experts Unlock Parameter Scaling for Deep RL

Authors: Johan Samir Obando Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we demonstrate that incorporating Mixture-of-Expert (Mo E) modules, and in particular Soft Mo Es (Puigcerver et al., 2023), into valuebased networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Researcher Affiliation Collaboration 1Google DeepMind 2Mila Quebec AI Institute 3Universite de Montreal 4University of Oxford 5McGill University.
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes We make our code publicly available.
Open Datasets Yes As in the original papers, we evaluate on 20 games from the Arcade Learning Environment (ALE), a collection of a diverse and challenging pixel-based environments (Bellemare et al., 2013b).
Dataset Splits No The paper describes how experiments were run (e.g., '5 independent seeds', '95% stratified bootstrap confidence intervals') and reports interquantile mean, but it does not specify any training/validation/test dataset splits in terms of data partitioning for model validation.
Hardware Specification Yes All experiments were run on NVIDIA Tesla P100 GPUs
Software Dependencies No The paper mentions 'Dopamine library' and lists general Python tools like 'NumPy', 'Matplotlib', 'Jupyter', 'Pandas', and 'JAX' in the acknowledgements. However, it does not provide specific version numbers for these software components, which are required for reproducibility.
Experiment Setup Yes Appendix B, titled 'Hyper-parameters list', provides detailed tables (Table 1, 2, 3, 4) of default hyper-parameter settings for various agents (DER, Dr Q(ϵ), DQN, Rainbow, CQL, CQL+C51) and neural network architectures. These tables specify values for batch size, learning rate, discount factor, replay capacity, activation functions, network dimensions, and other configuration details.