Adversarial Imitation Learning from Incomplete Demonstrations
Authors: Mingfei Sun, Xiaojuan Ma
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers comparable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available. ... Through various experiments on different levels of incompleteness of actions in demonstrations, we show that AGAIL consistently delivers comparable performance to two state-of-the-art algorithms even when the demonstrations provided are incomplete. |
| Researcher Affiliation | Academia | Mingfei Sun and Xiaojuan Ma Department of Computer Science and Engineering, Hong Kong University of Science and Technology mingfei.sun@ust.hk, mxj@cse.ust.hk |
| Pseudocode | Yes | Algorithm 1 Action-guided adversarial imitation learning |
| Open Source Code | No | 1See project page: https://mingfeisun.github.io/agail/ (The project page states: 'Code for AGAIL will be released soon!') |
| Open Datasets | Yes | Four simulation tasks, Cart Pole, Hopper, Walker and Humanoid (from low-dimensional to high-dimensional controls), are selected to cover discrete and continuous state/action space, and the specifications are listed in Table 1. ... All algorithms1 are implemented based on the work [Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes collecting demonstrations and masking actions, but does not specify explicit training/test/validation dataset splits or percentages for reproducing the experiment. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or memory) are provided. |
| Software Dependencies | No | The paper mentions 'OpenAI Gym' [Brockman et al., 2016], 'TRPO' [Schulman et al., 2015], and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use stochastic policy parametrized by three fully connected layers (100 hidden units and Tanh activation), and construct the value network by sharing the layers with the policy network. Both policy net and value net are optimized through gradient descend with Adam optimizer. ... In the experiment, we set α to 1 and relate β to the incompleteness ratio η (0, 1) of actions in demonstrations, β = 1 η. |