Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control
Authors: Seongwoong Cho, Donggyun Kim, Jinwoo Lee, Seunghoon Hong
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated in the Deep Mind Control suite, our framework termed Meta Controller demonstrates superior few-shot generalization to unseen embodiments and tasks over modular policy learning and few-shot IL approaches. and 5 Experiments We evaluate the few-shot behavior cloning of unseen embodiments and tasks within the Deep Mind Control (DMC) suite [31] |
| Researcher Affiliation | Academia | Seongwoong Cho Donggyun Kim Jinwoo Lee Seunghoon Hong School of Computing, KAIST {seongwoongjo, kdgyun425, bestgenius10, seunghoon.hong}@kaist.ac.kr |
| Pseudocode | No | The paper describes the model architecture and training protocol in prose and through diagrams (e.g., Figure 1, Figure 2, Figure 3), but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/Seongwoong Cho/meta-controller. |
| Open Datasets | Yes | Environment and Dataset. We evaluate the few-shot behavior cloning of unseen embodiments and tasks within the Deep Mind Control (DMC) suite [31], which includes continuous control tasks featuring diverse kinematic structures. and Our meta-training dataset is constructed using a replay buffer of an expert agent [38], consisting of up to 2000 demonstration trajectories for each task and embodiment. |
| Dataset Splits | Yes | Few-shot Fine-Tuning. After acquiring the meta-knowledge about continuous control problems, we apply our Meta-Controller in a few-shot behavior cloning setup, where it should adapt to both unseen embodiments and tasks with a few demonstrations D. To this end, we randomly split D into two disjoint subsets, and fine-tune the model with Eq. (10) but with only respect to the embodiment-specific and task-specific parameters (p E s , θE s , θ(E,T ) m ) while freezing the rest. |
| Hardware Specification | Yes | For meta-training, we train the model with 8 RTX A6000 GPUs for approximately 25 hours, and we fine-tune the model on each task with 1 RTX A6000 GPU for approximately 2 hours. and Table 8 presents the inference time comparisons between our model and VC-1 [26], measured in a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | We implemented our model based on Py Torch Lightning [1] which supports both Intel Gaudi-v2 (HABANA) and NVIDIA AI accelerators (CUDA). We provide the code for both systems on separate branches in the Git Hub repository. While PyTorch Lightning is mentioned, its specific version number is not provided in the paper text. |
| Experiment Setup | Yes | All models are trained for 200,000 iterations using the Adam optimizer [22] and a poly learning rate scheduler [24] with a base learning rate of 2 10 4. After training, we fine-tune all models for 10,000 iterations with a fixed learning rate of 2 10 4... and Table 5: Hyper-parameters of Meta-Controller used in our experiments. Number of demonstrations used in each episode 4 Global Batch Size 64 Hidden dimension 512 Attention heads 4 Low-rank for structure encoder fs 16 Low-rank for motion encoder fm 16 Layerscale initialization 1 Training iteration 200,000 Learning rate warmup iterations 1000 Base learning rate 2 10 4 |