Hierarchical Few-Shot Imitation with Skill Transition Models
Authors: Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments we are interested in answering the following questions: (i) Can our method successfully imitate unseen long-horizon downstream demonstrations? (ii) What is the importance of semi-parametric approach vs. future conditioning? (iii) Is pre-training and fine-tuning the skill embedding model necessary for achieving high success rate? and 4.2 RESULTS, 4.3 ABLATION STUDIES |
| Researcher Affiliation | -1 | Anonymous authors Paper under double-blind review |
| Pseudocode | Yes | Algorithm 1 FIST: Evaluation Algorithm |
| Open Source Code | Yes | Our codebase builds upon the SPi RL released code and is located at https://anonymous.4open.science/r/fist-C5DF/README.md. |
| Open Datasets | Yes | The data for Point Maze is collected using the same scripts provided in the D4RL dataset repository Fu et al. (2020). and The data for Ant Maze is solely based on the "ant-large-diverse-v0" dataset in D4RL. |
| Dataset Splits | No | The paper describes training and fine-tuning procedures but does not explicitly provide training/validation/test dataset splits needed for reproduction. It mentions fine-tuning on 'Ddemo' which consists of 10 expert trajectories, but this is not explicitly called a validation set or part of a formal split. |
| Hardware Specification | Yes | The training for both skill extraction and fine-tuning were done on a single NVIDIA 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using Adam optimizer with specific parameters, but does not provide version numbers for programming languages or major libraries used in the implementation. |
| Experiment Setup | Yes | Hyperparameters used for training are listed in Table 3. Hyperparameter Value Contrastive Distance Metric Encoder output dim 32 Encoder Hidden Layers 128 Encoder # Hidden Layers 2 Optimizer Adam(β1 = 0.9, β2 = 0.999, LR=1e-3) Skill extraction Epochs 200 Batch size 128 Optimizer Adam(β1 = 0.9, β2 = 0.999, LR=1e-3) H (sub-trajectory length) 10 β 5e-4 (Kitchen), 1e-2 (Maze) Skill Encoder dim-Z in VAE 128 hidden dim 128 # LSTM Layers 1 Skill Decoder hidden dim 128 # hidden layers 5 Inverse Skill Dynamic Model hidden dim 128 # hidden layers 5 |