reproducibilityindex.ai

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Authors: Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically compare SG-MRL and MAML in several deep RL environments. We also empirically validate the proposed SG-MRL algorithm in larger-scale environments standard in modern reinforcement learning applications, including a 2D-navigation problem, and a more challenging locomotion problem simulated with the Mu Jo Co library.
Researcher Affiliation	Academia	Alireza Fallah EECS Department Massachusetts Institute of Technology afallah@mit.edu Kristian Georgiev EECS Department Massachusetts Institute of Technology krisgrg@mit.edu Aryan Mokhtari ECE Department The University of Texas at Austin mokhtari@austin.utexas.edu Asuman Ozdaglar EECS Department Massachusetts Institute of Technology asuman@mit.edu
Pseudocode	Yes	Algorithm 1: Proposed SG-MRL method for Meta-RL
Open Source Code	Yes	The code is available online3. The code is available at https://github.com/kristian-georgiev/SGMRL.
Open Datasets	No	The paper describes the 2D-navigation and MuJoCo locomotion environments and tasks used for experiments, but it does not provide concrete access information (e.g., links, DOIs, citations with authors/year, or specific repository names) for publicly available datasets used in the training process.
Dataset Splits	No	The paper does not explicitly provide specific numerical training, validation, or test dataset splits (e.g., percentages or sample counts). While it discusses training and meta-testing in the context of tasks, it does not refer to traditional dataset partitions.
Hardware Specification	No	The paper states that "All experiments were conducted in MIT s Supercloud [28]", but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions using "a neural network policy" and optimizing with "vanilla policy gradient" and the "Mu Jo Co library" but does not specify version numbers for these software components or other dependencies.
Experiment Setup	No	The paper mentions general aspects of its method like learning rates (α, β) and batch sizes (B, Din, Do), but it does not provide specific numerical values for hyperparameters or other explicit system-level training settings in the main text. It states that "Further implementation details are outlined in Appendix H."