reproducibilityindex.ai

Learning and Planning in Complex Action Spaces

Authors: Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: Deep Mind Control Suite and Real-World RL Suite.
Researcher Affiliation	Industry	1Deep Mind, London, UK. Correspondence to: Thomas Hubert <tkhubert@google.com>.
Pseudocode	No	The algorithm Sampled Mu Zero is described in Section 5 in prose, detailing its modifications to Mu Zero, without a formal pseudocode block or algorithm listing.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the public availability of its source code.
Open Datasets	Yes	To demonstrate the generality of this approach, we apply our algorithm to two continuous control benchmark domains, the Deep Mind Control Suite (Tassa et al., 2018) and Real-World RL Suite (Dulac-Arnold et2020).
Dataset Splits	No	The paper mentions using '3 seeds per experiment' and refers to 'data budgets' and 'task classification' from other papers but does not provide specific train/validation/test dataset split information (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not specify the CPU, GPU models, memory, or any other specific hardware used for running the experiments.
Software Dependencies	No	While it states 'All models are implemented in JAX (Bradbury et al., 2018) using Haiku (Hennigan et al., 2020)', these are references to the software packages themselves and do not provide specific version numbers for JAX or Haiku used in the experiments. No other software dependencies with version numbers are listed.
Experiment Setup	Yes	Appendix A.3, Table 3 lists all hyperparameters used across all experiments, providing specific values for batch size, discount, learning rate schedule parameters (warmup steps, decay rate), Adam optimizer parameters (epsilon, beta1, beta2, weight decay), observation stack, LSTM hidden size, number of simulations, and various loss coefficients.