Distributionally Adaptive Meta Reinforcement Learning
Authors: Anurag Ajay, Abhishek Gupta, Dibya Ghosh, Sergey Levine, Pulkit Agrawal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments verify the utility of adaptive distributional robustness under test-time task distribution shift in a number of simulated robotics domains. |
| Researcher Affiliation | Collaboration | Anurag Ajay , Abhishek Gupta * , Dibya Ghosh , Sergey Levine , Pulkit Agrawal Improbable AI Lab MIT-IBM Watson AI Lab University of California, Berkeley Massachusetts Institute Technology |
| Pseudocode | Yes | Algorithm 1 Di AMet R:Meta-training phase, Algorithm 2 Di AMet R: Meta-test phase |
| Open Source Code | Yes | We have included the code along with a README in the supplemental material |
| Open Datasets | No | The paper describes the tasks and distributions used (e.g., 'Ant navigation', 'Wind navigation', 'Object localization', 'Block push') and references Appendix D and Table 2 for details, but it does not provide concrete access information (link, DOI, repository, or formal citation with author/year for public dataset) for these specific task distributions to confirm if they are publicly available datasets. It states they design 'various meta RL tasks from these environments' implying custom distributions based on existing environments. |
| Dataset Splits | No | The paper mentions 'train task distribution' and 'test task distributions' and evaluates performance on them. It also describes a 'validation' step in Algorithm 1 where 'DKL(ptrain qϕ) < ϵ' is enforced through a Lagrange multiplier, which is a constraint validation rather than a data split validation. However, it does not explicitly provide percentages or sample counts for training, validation, and test *data splits* in the way required for data partitioning reproducibility. |
| Hardware Specification | No | The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing compute resources' but does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using 'off-policy RL2 [31] as a base learner' and 'dual gradient descent [30]' but does not provide version numbers for any software dependencies. |
| Experiment Setup | No | The paper describes the overall setup and environment types (e.g., Ant navigation, Wind navigation) and mentions 'exact task distributions in Table 2' and 'Appendix D for details about reward function and dynamics'. It also mentions finetuning with 250 meta-episodes in comparisons. However, it lacks specific hyperparameters (e.g., learning rates, batch sizes, optimizer details) for training the models within the main text or a clearly labeled experimental setup section. |