Taming MAML: Efficient unbiased meta-reinforcement learning
Authors: Hao Liu, Richard Socher, Caiming Xiong
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare our approach with MAML and other variance-bias trade-off methods including DICE, LVC, and action-dependent control variates. Our approach is easy to implement and outperforms existing methods in terms of the variance and accuracy of gradient estimation, ultimately yielding higher performance across a variety of challenging Meta-RL environments. |
| Researcher Affiliation | Industry | Hao Liu 1 Richard Socher 1 Caiming Xiong 1 1Salesforce Research, Palo Alto, USA. Correspondence to: Hao Liu <lhao499@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Learning Meta Control Variates |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper mentions using "Open AI Gym (Brockman et al., 2016) in the Mu Jo Co physics simulator (Todorov et al., 2012)" and describes how tasks are generated (e.g., "The tasks are generated by sampling the target positions from the uniform distribution on [ 3, 3]2"), but it does not provide concrete access information for specific datasets or environments required for replication. |
| Dataset Splits | No | The paper mentions "We use the first half trajectories to do adaptation of meta control variates and then compute inner loss of meta control variates on the other half of trajectories...", which describes an internal split for training control variates, but it does not specify general training, validation, or test dataset splits for the overall model. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions "Open AI Gym (Brockman et al., 2016)" and "Mu Jo Co physics simulator (Todorov et al., 2012)" and refers to "Proximal Policy Optimization (PPO) (Schulman et al., 2017)" and "TRPO algorithm", but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | No | The paper states: "Hyperparameters used in each environment can be found in supplementary file." This indicates that specific experimental setup details are not present in the main text. |