Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations
Authors: Yunke Wang, Chang Xu, Bo Du
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Mujoco demonstrate the great performance of our proposed method over other GAIL-based methods when dealing with imperfect demonstrations. |
| Researcher Affiliation | Academia | 1 National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, School of Computer Science and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China 2 School of Computer Science, Faculty of Engineering, The University of Sydney, Australia |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing the code. |
| Open Datasets | Yes | We conduct experiments to evaluate our proposed method on four Mujoco [Todorov et al., 2012] continuous control tasks with two kinds of imperfect demonstrations, i.e. suboptimal demonstrations (stage 1) and near-optimal demonstrations (stage 2). |
| Dataset Splits | No | The paper mentions 'evaluate the agent every 5,000 transitions in training' and 'conduct pre-training on WGAIL with about 10% of total interactions', but it does not specify a distinct validation dataset split for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The reward function rϕ in WGAIL and SAIL is constrained into [0, 5] by a sigmoid function. As [Kumar et al., 2010] suggested, we conduct pre-training on WGAIL with about 10% of total interactions before the weight learning step in SAIL. The threshold K is initialized such that half of the demonstrations can be included. We evaluate the agent every 5,000 transitions in training and the reported result in Table 1 is the average of the last 100 evaluations. Also, we conduct our experiment with five random seeds. |