On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning
Authors: Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare SG-MRL and MAML in several deep RL environments. We also empirically validate the proposed SG-MRL algorithm in larger-scale environments standard in modern reinforcement learning applications, including a 2D-navigation problem, and a more challenging locomotion problem simulated with the Mu Jo Co library. |
| Researcher Affiliation | Academia | Alireza Fallah EECS Department Massachusetts Institute of Technology afallah@mit.edu Kristian Georgiev EECS Department Massachusetts Institute of Technology krisgrg@mit.edu Aryan Mokhtari ECE Department The University of Texas at Austin mokhtari@austin.utexas.edu Asuman Ozdaglar EECS Department Massachusetts Institute of Technology asuman@mit.edu |
| Pseudocode | Yes | Algorithm 1: Proposed SG-MRL method for Meta-RL |
| Open Source Code | Yes | The code is available online3. The code is available at https://github.com/kristian-georgiev/SGMRL. |
| Open Datasets | No | The paper describes the 2D-navigation and MuJoCo locomotion environments and tasks used for experiments, but it does not provide concrete access information (e.g., links, DOIs, citations with authors/year, or specific repository names) for publicly available datasets used in the training process. |
| Dataset Splits | No | The paper does not explicitly provide specific numerical training, validation, or test dataset splits (e.g., percentages or sample counts). While it discusses training and meta-testing in the context of tasks, it does not refer to traditional dataset partitions. |
| Hardware Specification | No | The paper states that "All experiments were conducted in MIT s Supercloud [28]", but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions using "a neural network policy" and optimizing with "vanilla policy gradient" and the "Mu Jo Co library" but does not specify version numbers for these software components or other dependencies. |
| Experiment Setup | No | The paper mentions general aspects of its method like learning rates (α, β) and batch sizes (B, Din, Do), but it does not provide specific numerical values for hyperparameters or other explicit system-level training settings in the main text. It states that "Further implementation details are outlined in Appendix H." |