Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Authors: Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically compare SG-MRL and MAML in several deep RL environments. We also empirically validate the proposed SG-MRL algorithm in larger-scale environments standard in modern reinforcement learning applications, including a 2D-navigation problem, and a more challenging locomotion problem simulated with the Mu Jo Co library.
Researcher Affiliation Academia Alireza Fallah EECS Department Massachusetts Institute of Technology EMAIL Kristian Georgiev EECS Department Massachusetts Institute of Technology EMAIL Aryan Mokhtari ECE Department The University of Texas at Austin EMAIL Asuman Ozdaglar EECS Department Massachusetts Institute of Technology EMAIL
Pseudocode Yes Algorithm 1: Proposed SG-MRL method for Meta-RL
Open Source Code Yes The code is available online3. The code is available at https://github.com/kristian-georgiev/SGMRL.
Open Datasets No The paper describes the 2D-navigation and MuJoCo locomotion environments and tasks used for experiments, but it does not provide concrete access information (e.g., links, DOIs, citations with authors/year, or specific repository names) for publicly available datasets used in the training process.
Dataset Splits No The paper does not explicitly provide specific numerical training, validation, or test dataset splits (e.g., percentages or sample counts). While it discusses training and meta-testing in the context of tasks, it does not refer to traditional dataset partitions.
Hardware Specification No The paper states that "All experiments were conducted in MIT s Supercloud [28]", but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies No The paper mentions using "a neural network policy" and optimizing with "vanilla policy gradient" and the "Mu Jo Co library" but does not specify version numbers for these software components or other dependencies.
Experiment Setup No The paper mentions general aspects of its method like learning rates (α, β) and batch sizes (B, Din, Do), but it does not provide specific numerical values for hyperparameters or other explicit system-level training settings in the main text. It states that "Further implementation details are outlined in Appendix H."