Meta-learning Parameterized Skills
Authors: Haotian Fu, Shangqun Yu, Saket Tiwari, Michael Littman, George Konidaris
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that the proposed algorithms enable an agent to solve a set of difficult long-horizon (obstacle-course and robot manipulation) tasks. ... Using the proposed algorithm, we are able to solve a set of difficult long-horizon ant obstacle course tasks, as well as long-horizon robotic manipulation tasks. We demonstrate the importance of smoothness for a learned parameterized action space and the effectiveness of the different components of our algorithm independently. |
| Researcher Affiliation | Academia | Haotian Fu 1 Shangqun Yu 2 Saket Tiwari 1 Michael Littman 1 George Konidaris 1 1Department of Computer Science, Brown University 2The University of Massachusetts Amherst. |
| Pseudocode | Yes | Algorithm 1 Meta-Learning Parameterized Skill (MLPS) Meta-training (regular encoder network) ... Algorithm 2 Parameterized Skill Learning (MLPS) Meta-training (Sequential encoder network) |
| Open Source Code | Yes | Our code is available at https://github.com/Minusadd/Meta-learning-parameterized-skills. |
| Open Datasets | Yes | we evaluate our algorithm on a Ant obstacle course domain built on Open AI gym [6] and a robotic manipulation domain from Meta World [63]. |
| Dataset Splits | No | The paper mentions 'meta-train tasks' and 'test tasks' and describes how test tasks are sampled ('40 test tasks', 'linearly sampled from the given task distribution'). However, it does not explicitly provide details on training, validation, or test dataset splits in terms of percentages, sample counts, or clear references to standard, reproducible splits for a single dataset. |
| Hardware Specification | No | This work was conducted using computational resources and services at the Center for Computation and Visualization, Brown University. The paper mentions using 'computational resources', but does not specify exact hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | Yes | We run all experiment with the mujoco simulator [60]... For SAC, we use the stable-baselines3 implementation6[49]. |
| Experiment Setup | Yes | Both actor network and critic network in MLPS are parameterized MLPs with 2 hidden layers of (300, 300) units. The context/trajectory encoder network is modeled as product of independent Gaussian factors, with 3 hidden layers of (400, 400, 400) units. We set the learning rate as 3e 4. The scale of KL divergence loss is set to be 0.1. ... Table 1. MLPS s hyperparameters Environment # Meta-train tasks α β κ Meta batch size Embedding batch size |