reproducibilityindex.ai

Muesli: Combining Improvements in Policy Optimization

Authors: Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado Van Hasselt

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The majority of our experiments were performed on 57 classic Atari games from the Arcade Learning Environment...To help understand the different design choices made in Muesli, our experiments on Atari include multiple ablations of our proposed update. Additionally, to evaluate how well our method generalises to different domains, we performed experiments on a suite of continuous control environments...We also conducted experiments in 9x9 Go in self-play...
Researcher Affiliation	Collaboration	1Deep Mind, London, UK 2University College London. Correspondence to: Matteo Hessel <mtthss@google.com>, Ivo Danihelka <danihelka@google.com>, Hado van Hasselt <hado@google.com>.
Pseudocode	No	The paper describes methods mathematically and through text, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper mentions using specific libraries like JAX, Optax, Haiku, and Rlax, but does not provide an explicit statement about releasing the source code for the Muesli methodology described in the paper, nor does it provide a link to a code repository.
Open Datasets	Yes	The majority of our experiments were performed on 57 classic Atari games from the Arcade Learning Environment (Bellemare et al., 2013; Machado et al., 2018)... Additionally, to evaluate how well our method generalises to different domains, we performed experiments on a suite of continuous control environments (based on Mu Jo Co and sourced from the Open AI Gym (Brockman et al., 2016)).
Dataset Splits	No	The paper mentions training agents using 'uniform experience replay' and 'multi-step returns' but does not specify explicit training/validation/test dataset splits, nor does it provide percentages or sample counts for validation.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	The paper mentions using JAX, Optax, Haiku, and Rlax libraries, but it does not specify their version numbers, which are required for a reproducible description of software dependencies.
Experiment Setup	Yes	We used c = 1 in our experiments, across all domains... We used λ = 1 in all other experiments reported in the paper... All agents in this section are trained using the Sebulba podracer architecture (Hessel et al., 2021)... the model described in Section 4.3 is parametrized by an LSTM (Hochreiter & Schmidhuber, 1997)... Agents are trained using uniform experience replay, and estimate multi-step returns using Retrace (Munos et al., 2016).