Generalizable Imitation Learning from Observation via Inferring Goal Proximity
Authors: Youngwoon Lee, Andrew Szot, Shao-Hua Sun, Joseph J. Lim
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our proposed method can robustly generalize compared to prior imitation learning methods on a set of goal-directed tasks in navigation, locomotion, and robotic manipulation, even with demonstrations that cover only a part of the states. Our extensive experiments show that the policy learned with the goal proximity function generalizes better than the state-of-the-art Lf O algorithms on various goal-directed tasks, including navigation, locomotion, and robotic manipulation. |
| Researcher Affiliation | Collaboration | Youngwoon Lee1 Andrew Szot2 Shao-Hua Sun1 Joseph J. Lim1 1University of Southern California 2Georgia Institute of Technology This work was partially carried out during an internship at NAVER AI Lab. AI Advisor at NAVER AI Lab. |
| Pseudocode | Yes | We jointly train the proximity function and policy as described in appendix, Algorithm 1. |
| Open Source Code | No | The paper provides a general project website link (https://clvrai.com/gpil) but no explicit statement or direct link to a source-code repository for the methodology described in the paper. |
| Open Datasets | Yes | We further evaluate our method in MAZE2D [15] with the medium maze... We evaluate our method in two robotic manipulation tasks with the 7-Do F Fetch robotics arm: FETCH PICK and FETCH PUSH [34]. We evaluate our method in a challenging in-hand object manipulation task [34], HAND ROTATE... |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, and test dataset splits for its experiments. It describes how demonstrations are collected (e.g., 25%, 50% coverage) but not data splits for model training and evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud computing instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions using PPO [40] for policy optimization and references PyTorch [32], but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use PPO [40], which is widely used in Lf O and Lf D methods, and its hyperparameters are tuned for each method and task (see appendix, Table 2). |