Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations
Authors: Yunke Wang, Chang Xu, Bo Du
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Mujoco demonstrate the great performance of our proposed method over other GAIL-based methods when dealing with imperfect demonstrations. |
| Researcher Affiliation | Academia | 1 National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, School of Computer Science and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China 2 School of Computer Science, Faculty of Engineering, The University of Sydney, Australia |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing the code. |
| Open Datasets | Yes | We conduct experiments to evaluate our proposed method on four Mujoco [Todorov et al., 2012] continuous control tasks with two kinds of imperfect demonstrations, i.e. suboptimal demonstrations (stage 1) and near-optimal demonstrations (stage 2). |
| Dataset Splits | No | The paper mentions 'evaluate the agent every 5,000 transitions in training' and 'conduct pre-training on WGAIL with about 10% of total interactions', but it does not specify a distinct validation dataset split for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The reward function rϕ in WGAIL and SAIL is constrained into [0, 5] by a sigmoid function. As [Kumar et al., 2010] suggested, we conduct pre-training on WGAIL with about 10% of total interactions before the weight learning step in SAIL. The threshold K is initialized such that half of the demonstrations can be included. We evaluate the agent every 5,000 transitions in training and the reported result in Table 1 is the average of the last 100 evaluations. Also, we conduct our experiment with five random seeds. |