reproducibilityindex.ai

Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning

Authors: Yunke Wang, Bo Du, Chang Xu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Mu Jo Co and Robo Suite platforms demonstrate the effectiveness of our method from different aspects. In this section, we conduct experiments to verify the effectiveness of UID in various benchmarks (i.e., Mu Jo Co (Todorov, Erez, and Tassa 2012) and Robosuite (Zhu et al. 2020)) under different settings.
Researcher Affiliation	Academia	1School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China. 2School of Computer Science, Faculty of Engineering, The University of Sydney, Australia.
Pseudocode	Yes	Algorithm 1: UID-GAIL
Open Source Code	Yes	1https://github.com/yunke-wang/UID
Open Datasets	Yes	We evaluate UID on three Mu Jo Co (Todorov, Erez, and Tassa 2012) locomotion tasks (i.e., Antv2, Half Cheetah-v2 and Walker2d-v2) firstly. We also conduct experiments on a robot control task in Robosuite (Zhu et al. 2020). We use real-world demonstrations by human operators from Robo Turk website2. 2https://roboturk.stanford.edu/dataset_sim.html
Dataset Splits	No	No explicit details about train/validation/test dataset splits (e.g., percentages, sample counts for each split, or references to predefined splits with citations) were found.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies	No	The paper mentions 'Mu Jo Co' and 'Robosuite' platforms but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We evaluate the agent every 5,000 transitions in training and the reported results are the average of the last 100 evaluations. We add Gaussian noise ξ to the action distribution a of πo to form non-optimal expert πn. The action of πn is modeled as a N(a , ξ2) and we choose ξ = [0.25, 0.4, 0.6] in these 3 non-optimal policies.