Meta-Q-Learning

Authors: Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents the experimental results of MQL. We first discuss the setup and provide details the benchmark in Sec. 4.1. This is followed by empirical results and ablation experiments in Sec. 4.2.
Researcher Affiliation Collaboration 1 Amazon Web Services 2 University of Pennsylvania Email: {fakoor, soattos, smola}@amazon.com, pratikac@seas.upenn.edu
Pseudocode Yes The pseudo-code for MQL during training and adaption are given in Algorithm 1 and Algorithm 2.
Open Source Code No The paper states that numbers for MAML and PEARL were obtained from 'training logs published by Rakelly et al. (2019)' and 'published code by Rakelly et al. (2019)', referring to third-party code. It does not provide an explicit statement or link to the authors' own source code for the methodology described in this paper.
Open Datasets Yes Tasks and algorithms: We use the Mu Jo Co (Todorov et al., 2012) simulator with Open AI Gym (Brockman et al., 2016) on continuous-control meta-RL benchmark tasks. These tasks have different rewards, randomized system parameters (Walker-2D-Params) and have been used in previous papers such as Finn et al. (2017); Rothfuss et al. (2018); Rakelly et al. (2019).
Dataset Splits Yes For each environment, Rakelly et al. (2019) constructed a fixed set of meta-training tasks (Dmeta) and a validation set of tasks Dnew that are disjoint from the meta-training set. To enable direct comparison with published empirical results, we closely followed the evaluation code of Rakelly et al. (2019) to create these tasks. We also use the exact same evaluation protocol as that of these authors, e.g., 200 timesteps of data from the new task, or the number of evaluation episodes. We report the undiscounted return on the validation tasks with statistics computed across 5 random seeds.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as CPU/GPU models or memory specifications.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014)' for optimizing all loss functions in Section 4.1. However, it does not provide a specific version number for Adam or any other software dependencies (e.g., Python, specific libraries like PyTorch or TensorFlow, etc.) required for replication.
Experiment Setup Yes Hyper-parameters for these tasks are provided in Appendix D. ... Table 1: Hyper-parameters for MQL and TD3 for continuous-control meta-RL benchmark tasks. We use a network with two full-connected layers for all environments. The batch-size in Adam is fixed to 256 for all environments.