Mixtures of Experts Unlock Parameter Scaling for Deep RL
Authors: Johan Samir Obando Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we demonstrate that incorporating Mixture-of-Expert (Mo E) modules, and in particular Soft Mo Es (Puigcerver et al., 2023), into valuebased networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning. |
| Researcher Affiliation | Collaboration | 1Google DeepMind 2Mila Quebec AI Institute 3Universite de Montreal 4University of Oxford 5McGill University. |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We make our code publicly available. |
| Open Datasets | Yes | As in the original papers, we evaluate on 20 games from the Arcade Learning Environment (ALE), a collection of a diverse and challenging pixel-based environments (Bellemare et al., 2013b). |
| Dataset Splits | No | The paper describes how experiments were run (e.g., '5 independent seeds', '95% stratified bootstrap confidence intervals') and reports interquantile mean, but it does not specify any training/validation/test dataset splits in terms of data partitioning for model validation. |
| Hardware Specification | Yes | All experiments were run on NVIDIA Tesla P100 GPUs |
| Software Dependencies | No | The paper mentions 'Dopamine library' and lists general Python tools like 'NumPy', 'Matplotlib', 'Jupyter', 'Pandas', and 'JAX' in the acknowledgements. However, it does not provide specific version numbers for these software components, which are required for reproducibility. |
| Experiment Setup | Yes | Appendix B, titled 'Hyper-parameters list', provides detailed tables (Table 1, 2, 3, 4) of default hyper-parameter settings for various agents (DER, Dr Q(ϵ), DQN, Rainbow, CQL, CQL+C51) and neural network architectures. These tables specify values for batch size, learning rate, discount factor, replay capacity, activation functions, network dimensions, and other configuration details. |