Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning
Authors: Yunke Wang, Bo Du, Chang Xu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Mu Jo Co and Robo Suite platforms demonstrate the effectiveness of our method from different aspects. In this section, we conduct experiments to verify the effectiveness of UID in various benchmarks (i.e., Mu Jo Co (Todorov, Erez, and Tassa 2012) and Robosuite (Zhu et al. 2020)) under different settings. |
| Researcher Affiliation | Academia | 1School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China. 2School of Computer Science, Faculty of Engineering, The University of Sydney, Australia. |
| Pseudocode | Yes | Algorithm 1: UID-GAIL |
| Open Source Code | Yes | 1https://github.com/yunke-wang/UID |
| Open Datasets | Yes | We evaluate UID on three Mu Jo Co (Todorov, Erez, and Tassa 2012) locomotion tasks (i.e., Antv2, Half Cheetah-v2 and Walker2d-v2) firstly. We also conduct experiments on a robot control task in Robosuite (Zhu et al. 2020). We use real-world demonstrations by human operators from Robo Turk website2. 2https://roboturk.stanford.edu/dataset_sim.html |
| Dataset Splits | No | No explicit details about train/validation/test dataset splits (e.g., percentages, sample counts for each split, or references to predefined splits with citations) were found. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co' and 'Robosuite' platforms but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We evaluate the agent every 5,000 transitions in training and the reported results are the average of the last 100 evaluations. We add Gaussian noise ξ to the action distribution a of πo to form non-optimal expert πn. The action of πn is modeled as a N(a , ξ2) and we choose ξ = [0.25, 0.4, 0.6] in these 3 non-optimal policies. |