reproducibilityindex.ai

Meta-Q-Learning

Authors: Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents the experimental results of MQL. We ﬁrst discuss the setup and provide details the benchmark in Sec. 4.1. This is followed by empirical results and ablation experiments in Sec. 4.2.
Researcher Affiliation	Collaboration	1 Amazon Web Services 2 University of Pennsylvania Email: {fakoor, soattos, smola}@amazon.com, pratikac@seas.upenn.edu
Pseudocode	Yes	The pseudo-code for MQL during training and adaption are given in Algorithm 1 and Algorithm 2.
Open Source Code	No	The paper states that numbers for MAML and PEARL were obtained from 'training logs published by Rakelly et al. (2019)' and 'published code by Rakelly et al. (2019)', referring to third-party code. It does not provide an explicit statement or link to the authors' own source code for the methodology described in this paper.
Open Datasets	Yes	Tasks and algorithms: We use the Mu Jo Co (Todorov et al., 2012) simulator with Open AI Gym (Brockman et al., 2016) on continuous-control meta-RL benchmark tasks. These tasks have different rewards, randomized system parameters (Walker-2D-Params) and have been used in previous papers such as Finn et al. (2017); Rothfuss et al. (2018); Rakelly et al. (2019).
Dataset Splits	Yes	For each environment, Rakelly et al. (2019) constructed a ﬁxed set of meta-training tasks (Dmeta) and a validation set of tasks Dnew that are disjoint from the meta-training set. To enable direct comparison with published empirical results, we closely followed the evaluation code of Rakelly et al. (2019) to create these tasks. We also use the exact same evaluation protocol as that of these authors, e.g., 200 timesteps of data from the new task, or the number of evaluation episodes. We report the undiscounted return on the validation tasks with statistics computed across 5 random seeds.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as CPU/GPU models or memory specifications.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014)' for optimizing all loss functions in Section 4.1. However, it does not provide a specific version number for Adam or any other software dependencies (e.g., Python, specific libraries like PyTorch or TensorFlow, etc.) required for replication.
Experiment Setup	Yes	Hyper-parameters for these tasks are provided in Appendix D. ... Table 1: Hyper-parameters for MQL and TD3 for continuous-control meta-RL benchmark tasks. We use a network with two full-connected layers for all environments. The batch-size in Adam is ﬁxed to 256 for all environments.