Meta-learning Parameterized Skills

Authors: Haotian Fu, Shangqun Yu, Saket Tiwari, Michael Littman, George Konidaris

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that the proposed algorithms enable an agent to solve a set of difficult long-horizon (obstacle-course and robot manipulation) tasks. ... Using the proposed algorithm, we are able to solve a set of difficult long-horizon ant obstacle course tasks, as well as long-horizon robotic manipulation tasks. We demonstrate the importance of smoothness for a learned parameterized action space and the effectiveness of the different components of our algorithm independently.
Researcher Affiliation Academia Haotian Fu 1 Shangqun Yu 2 Saket Tiwari 1 Michael Littman 1 George Konidaris 1 1Department of Computer Science, Brown University 2The University of Massachusetts Amherst.
Pseudocode Yes Algorithm 1 Meta-Learning Parameterized Skill (MLPS) Meta-training (regular encoder network) ... Algorithm 2 Parameterized Skill Learning (MLPS) Meta-training (Sequential encoder network)
Open Source Code Yes Our code is available at https://github.com/Minusadd/Meta-learning-parameterized-skills.
Open Datasets Yes we evaluate our algorithm on a Ant obstacle course domain built on Open AI gym [6] and a robotic manipulation domain from Meta World [63].
Dataset Splits No The paper mentions 'meta-train tasks' and 'test tasks' and describes how test tasks are sampled ('40 test tasks', 'linearly sampled from the given task distribution'). However, it does not explicitly provide details on training, validation, or test dataset splits in terms of percentages, sample counts, or clear references to standard, reproducible splits for a single dataset.
Hardware Specification No This work was conducted using computational resources and services at the Center for Computation and Visualization, Brown University. The paper mentions using 'computational resources', but does not specify exact hardware details such as GPU models, CPU types, or memory.
Software Dependencies Yes We run all experiment with the mujoco simulator [60]... For SAC, we use the stable-baselines3 implementation6[49].
Experiment Setup Yes Both actor network and critic network in MLPS are parameterized MLPs with 2 hidden layers of (300, 300) units. The context/trajectory encoder network is modeled as product of independent Gaussian factors, with 3 hidden layers of (400, 400, 400) units. We set the learning rate as 3e 4. The scale of KL divergence loss is set to be 0.1. ... Table 1. MLPS s hyperparameters Environment # Meta-train tasks α β κ Meta batch size Embedding batch size