reproducibilityindex.ai

Generalizable Imitation Learning from Observation via Inferring Goal Proximity

Authors: Youngwoon Lee, Andrew Szot, Shao-Hua Sun, Joseph J. Lim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our proposed method can robustly generalize compared to prior imitation learning methods on a set of goal-directed tasks in navigation, locomotion, and robotic manipulation, even with demonstrations that cover only a part of the states. Our extensive experiments show that the policy learned with the goal proximity function generalizes better than the state-of-the-art Lf O algorithms on various goal-directed tasks, including navigation, locomotion, and robotic manipulation.
Researcher Affiliation	Collaboration	Youngwoon Lee1 Andrew Szot2 Shao-Hua Sun1 Joseph J. Lim1 1University of Southern California 2Georgia Institute of Technology This work was partially carried out during an internship at NAVER AI Lab. AI Advisor at NAVER AI Lab.
Pseudocode	Yes	We jointly train the proximity function and policy as described in appendix, Algorithm 1.
Open Source Code	No	The paper provides a general project website link (https://clvrai.com/gpil) but no explicit statement or direct link to a source-code repository for the methodology described in the paper.
Open Datasets	Yes	We further evaluate our method in MAZE2D [15] with the medium maze... We evaluate our method in two robotic manipulation tasks with the 7-Do F Fetch robotics arm: FETCH PICK and FETCH PUSH [34]. We evaluate our method in a challenging in-hand object manipulation task [34], HAND ROTATE...
Dataset Splits	No	The paper does not provide specific percentages or counts for training, validation, and test dataset splits for its experiments. It describes how demonstrations are collected (e.g., 25%, 50% coverage) but not data splits for model training and evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud computing instance types) used to run the experiments.
Software Dependencies	No	The paper mentions using PPO [40] for policy optimization and references PyTorch [32], but it does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use PPO [40], which is widely used in Lf O and Lf D methods, and its hyperparameters are tuned for each method and task (see appendix, Table 2).