Meta-Q-Learning
Authors: Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the experimental results of MQL. We first discuss the setup and provide details the benchmark in Sec. 4.1. This is followed by empirical results and ablation experiments in Sec. 4.2. |
| Researcher Affiliation | Collaboration | 1 Amazon Web Services 2 University of Pennsylvania Email: {fakoor, soattos, smola}@amazon.com, pratikac@seas.upenn.edu |
| Pseudocode | Yes | The pseudo-code for MQL during training and adaption are given in Algorithm 1 and Algorithm 2. |
| Open Source Code | No | The paper states that numbers for MAML and PEARL were obtained from 'training logs published by Rakelly et al. (2019)' and 'published code by Rakelly et al. (2019)', referring to third-party code. It does not provide an explicit statement or link to the authors' own source code for the methodology described in this paper. |
| Open Datasets | Yes | Tasks and algorithms: We use the Mu Jo Co (Todorov et al., 2012) simulator with Open AI Gym (Brockman et al., 2016) on continuous-control meta-RL benchmark tasks. These tasks have different rewards, randomized system parameters (Walker-2D-Params) and have been used in previous papers such as Finn et al. (2017); Rothfuss et al. (2018); Rakelly et al. (2019). |
| Dataset Splits | Yes | For each environment, Rakelly et al. (2019) constructed a fixed set of meta-training tasks (Dmeta) and a validation set of tasks Dnew that are disjoint from the meta-training set. To enable direct comparison with published empirical results, we closely followed the evaluation code of Rakelly et al. (2019) to create these tasks. We also use the exact same evaluation protocol as that of these authors, e.g., 200 timesteps of data from the new task, or the number of evaluation episodes. We report the undiscounted return on the validation tasks with statistics computed across 5 random seeds. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' for optimizing all loss functions in Section 4.1. However, it does not provide a specific version number for Adam or any other software dependencies (e.g., Python, specific libraries like PyTorch or TensorFlow, etc.) required for replication. |
| Experiment Setup | Yes | Hyper-parameters for these tasks are provided in Appendix D. ... Table 1: Hyper-parameters for MQL and TD3 for continuous-control meta-RL benchmark tasks. We use a network with two full-connected layers for all environments. The batch-size in Adam is fixed to 256 for all environments. |