Generalizable Imitation Learning from Observation via Inferring Goal Proximity

Authors: Youngwoon Lee, Andrew Szot, Shao-Hua Sun, Joseph J. Lim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our proposed method can robustly generalize compared to prior imitation learning methods on a set of goal-directed tasks in navigation, locomotion, and robotic manipulation, even with demonstrations that cover only a part of the states. Our extensive experiments show that the policy learned with the goal proximity function generalizes better than the state-of-the-art Lf O algorithms on various goal-directed tasks, including navigation, locomotion, and robotic manipulation.
Researcher Affiliation Collaboration Youngwoon Lee1 Andrew Szot2 Shao-Hua Sun1 Joseph J. Lim1 1University of Southern California 2Georgia Institute of Technology This work was partially carried out during an internship at NAVER AI Lab. AI Advisor at NAVER AI Lab.
Pseudocode Yes We jointly train the proximity function and policy as described in appendix, Algorithm 1.
Open Source Code No The paper provides a general project website link (https://clvrai.com/gpil) but no explicit statement or direct link to a source-code repository for the methodology described in the paper.
Open Datasets Yes We further evaluate our method in MAZE2D [15] with the medium maze... We evaluate our method in two robotic manipulation tasks with the 7-Do F Fetch robotics arm: FETCH PICK and FETCH PUSH [34]. We evaluate our method in a challenging in-hand object manipulation task [34], HAND ROTATE...
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, and test dataset splits for its experiments. It describes how demonstrations are collected (e.g., 25%, 50% coverage) but not data splits for model training and evaluation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud computing instance types) used to run the experiments.
Software Dependencies No The paper mentions using PPO [40] for policy optimization and references PyTorch [32], but it does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We use PPO [40], which is widely used in Lf O and Lf D methods, and its hyperparameters are tuned for each method and task (see appendix, Table 2).