Distributionally Adaptive Meta Reinforcement Learning

Authors: Anurag Ajay, Abhishek Gupta, Dibya Ghosh, Sergey Levine, Pulkit Agrawal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments verify the utility of adaptive distributional robustness under test-time task distribution shift in a number of simulated robotics domains.
Researcher Affiliation Collaboration Anurag Ajay , Abhishek Gupta * , Dibya Ghosh , Sergey Levine , Pulkit Agrawal Improbable AI Lab MIT-IBM Watson AI Lab University of California, Berkeley Massachusetts Institute Technology
Pseudocode Yes Algorithm 1 Di AMet R:Meta-training phase, Algorithm 2 Di AMet R: Meta-test phase
Open Source Code Yes We have included the code along with a README in the supplemental material
Open Datasets No The paper describes the tasks and distributions used (e.g., 'Ant navigation', 'Wind navigation', 'Object localization', 'Block push') and references Appendix D and Table 2 for details, but it does not provide concrete access information (link, DOI, repository, or formal citation with author/year for public dataset) for these specific task distributions to confirm if they are publicly available datasets. It states they design 'various meta RL tasks from these environments' implying custom distributions based on existing environments.
Dataset Splits No The paper mentions 'train task distribution' and 'test task distributions' and evaluates performance on them. It also describes a 'validation' step in Algorithm 1 where 'DKL(ptrain qϕ) < ϵ' is enforced through a Lagrange multiplier, which is a constraint validation rather than a data split validation. However, it does not explicitly provide percentages or sample counts for training, validation, and test *data splits* in the way required for data partitioning reproducibility.
Hardware Specification No The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing compute resources' but does not provide specific hardware details like GPU/CPU models or memory.
Software Dependencies No The paper mentions using 'off-policy RL2 [31] as a base learner' and 'dual gradient descent [30]' but does not provide version numbers for any software dependencies.
Experiment Setup No The paper describes the overall setup and environment types (e.g., Ant navigation, Wind navigation) and mentions 'exact task distributions in Table 2' and 'Appendix D for details about reward function and dynamics'. It also mentions finetuning with 250 meta-episodes in comparisons. However, it lacks specific hyperparameters (e.g., learning rates, batch sizes, optimizer details) for training the models within the main text or a clearly labeled experimental setup section.