CoMPS: Continual Meta Policy Search

Authors: Glen Berseth, Zhiwei Zhang, Grace Zhang, Chelsea Finn, Sergey Levine

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that as the agent experiences more tasks, learning time on new tasks decreases, indicating that meta-reinforcement learning performance increases asymptotically with the number of tasks.Our experiments aim to analyze the performance of Co MPS on both stationary and non-stationary task sequences in the continual meta-learning setting, where each task is observed once and never revisited again during the learning process.
Researcher Affiliation Academia Anonymous authors Paper under double-blind review
Pseudocode Yes Algorithm 1 Co MPS Meta-Learning
Open Source Code Yes We have also provided the code used for the experiments.
Open Datasets Yes Last, we utilize the suite of robotic manipulation tasks from Meta World Yu et al. (2020b)
Dataset Splits No The paper mentions 'held-out tasks' but does not specify explicit training/validation/test dataset splits with percentages or counts.
Hardware Specification No The paper mentions 'virtual machines with 8 CPUs and 16 Gi B of memory' and 'cloud computing technology (AWS/GCP/Azure/slurm)', but lacks specific CPU/GPU models or detailed cloud instance specifications.
Software Dependencies No The paper mentions software components like PPO but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For the RL part of Co MPS, GMPS+PPO, PPO+TL, and PNC we explored the best hyperparameters... For the meta-step (M) that Co MPS and GMPS+PPO share we tuned those algorithms using a similar process. We searched for the best parameter values of learning rate, the number of training updates per batch of data, alpha, batch size, and the number of total iterations or after each new task. We selected the parameters that achieved the highest stable rewards during training. We used the same values for both Co MPS and GMPS+PPO. For PNC we also performed hyperparameter tuning over the learning rate, the number of consolidation/compress steps, and EWC weight. For PEARL we performed hyperparameter tuning over the learning rate, batch size and the number of gradient steps.Table 1: Co MPS Hyperparameters