reproducibilityindex.ai

Taming MAML: Efficient unbiased meta-reinforcement learning

Authors: Hao Liu, Richard Socher, Caiming Xiong

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments.
Researcher Affiliation	Industry	Hao Liu 1 Richard Socher 1 Caiming Xiong 1 1Salesforce Research, Palo Alto, USA. Correspondence to: Hao Liu <lhao499@gmail.com>.
Pseudocode	Yes	Algorithm 1 Learning Meta Control Variates
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	The paper mentions using "Open AI Gym (Brockman et al., 2016) in the Mu Jo Co physics simulator (Todorov et al., 2012)" and describes how tasks are generated (e.g., "The tasks are generated by sampling the target positions from the uniform distribution on [ 3, 3]2"), but it does not provide concrete access information for specific datasets or environments required for replication.
Dataset Splits	No	The paper mentions "We use the ﬁrst half trajectories to do adaptation of meta control variates and then compute inner loss of meta control variates on the other half of trajectories...", which describes an internal split for training control variates, but it does not specify general training, validation, or test dataset splits for the overall model.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	The paper mentions "Open AI Gym (Brockman et al., 2016)" and "Mu Jo Co physics simulator (Todorov et al., 2012)" and refers to "Proximal Policy Optimization (PPO) (Schulman et al., 2017)" and "TRPO algorithm", but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	No	The paper states: "Hyperparameters used in each environment can be found in supplementary ﬁle." This indicates that specific experimental setup details are not present in the main text.