reproducibilityindex.ai

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

Authors: Songyuan Zhang, ZHANGJIE CAO, Dorsa Sadigh, Yanan Sui

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments. Our results show that CAIL signiﬁcantly outperforms other imitation learning methods from demonstrations with varying optimality.
Researcher Affiliation	Academia	Songyuan Zhang1 , Zhangjie Cao2 , Dorsa Sadigh2, Yanan Sui1 1National Engineering Lab for Neuromodulation, SAE, Tsinghua University, China 2Department of Computer Science, Stanford University, USA szhang21@mit.edu, {caozj,dorsa}@cs.stanford.edu, ysui@tsinghua.edu.cn
Pseudocode	No	The paper describes its optimization process with equations and textual descriptions but does not provide a formally labeled pseudocode block or algorithm.
Open Source Code	No	The code is available on our website3. The provided link (https://sites.google.com/view/cail) is to a general project website and not a direct link to a source-code repository (e.g., GitHub, GitLab).
Open Datasets	Yes	We conduct experiments in four environments including two Mu Jo Co environments (Reacher and Ant) [28] in Open AI Gym [7], one Franka Panda Arm4 simulation environment, and one real robot environment with a UR5e robot arm5.
Dataset Splits	Yes	In our implementation, we use a limited amount of ranked demonstrations as our evaluation data for the outer loss...We label only 5% of the demonstrated trajectories with rankings since we target realistic settings where only a small number of rankings are available for the demonstrations.
Hardware Specification	No	The paper mentions simulated environments (MuJoCo, Franka Panda Arm) and a real robot arm (UR5e), but does not specify any hardware used for computation (e.g., CPU, GPU models, or memory specifications).
Software Dependencies	No	For the RL algorithm, we use SAC [16] for the Reacher environment and PPO [24] for the Ant environment. These are algorithms, but specific software libraries or their version numbers are not provided.
Experiment Setup	Yes	We collect 200 trajectories in total, where each trajectory has 50 interaction steps...We collect trajectories with 200,000 interaction steps in total...We label only 5% of the demonstrated trajectories with rankings...In all the experiments, we use ϵ = 10 5.