Planning for Sample Efficient Imitation Learning
Authors: Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark our method not only on the state-based Deep Mind Control Suite, but also on the image version which many previous works find highly challenging. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency. EI shows over 4x gain in performance in the limited sample setting on state-based and image-based tasks and can solve challenging problems like Humanoid, where previous methods fail with a small amount of interactions. |
| Researcher Affiliation | Academia | Zhao-Heng Yin Weirui Ye Qifeng Chen Yang Gao HKUST Tsinghua University Shanghai Qi Zhi Institute zhaoheng.yin@connect.ust.hk, cqf@ust.hk ywr20@mails.tsinghua.edu.cn, gaoyangiiis@tsinghua.edu.cn |
| Pseudocode | No | The paper includes diagrams (Figure 1) and describes procedures in text, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/zhaohengyin/Efficient Imitate. We open-source the code at https://github.com/zhaohengyin/Efficient Imitate to facilitate future research. |
| Open Datasets | Yes | We use the Deep Mind Control Suite [44] for evaluation. |
| Dataset Splits | No | The paper specifies the number of demonstrations used and online steps for experiments, but it does not provide explicit training/validation/test dataset splits (e.g., percentages or counts for distinct sets beyond expert demonstrations). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | α is a mixture factor, which is fixed during training and π is the current policy. We use α = 0.25 in this paper. We use the Reanalyze algorithm [40, 48] for offline training, and we require that all the samples should be reanalyzed. The default value of K and N in the previous experiments are 16 and 50. We sweep K {4, 8, 16, 24} and N {5, 10, 25, 50} to evaluate their effects. |