Adversarial Imitation Learning from Incomplete Demonstrations

Authors: Mingfei Sun, Xiaojuan Ma

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers comparable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available. ... Through various experiments on different levels of incompleteness of actions in demonstrations, we show that AGAIL consistently delivers comparable performance to two state-of-the-art algorithms even when the demonstrations provided are incomplete.
Researcher Affiliation Academia Mingfei Sun and Xiaojuan Ma Department of Computer Science and Engineering, Hong Kong University of Science and Technology mingfei.sun@ust.hk, mxj@cse.ust.hk
Pseudocode Yes Algorithm 1 Action-guided adversarial imitation learning
Open Source Code No 1See project page: https://mingfeisun.github.io/agail/ (The project page states: 'Code for AGAIL will be released soon!')
Open Datasets Yes Four simulation tasks, Cart Pole, Hopper, Walker and Humanoid (from low-dimensional to high-dimensional controls), are selected to cover discrete and continuous state/action space, and the specifications are listed in Table 1. ... All algorithms1 are implemented based on the work [Brockman et al., 2016].
Dataset Splits No The paper describes collecting demonstrations and masking actions, but does not specify explicit training/test/validation dataset splits or percentages for reproducing the experiment.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, or memory) are provided.
Software Dependencies No The paper mentions 'OpenAI Gym' [Brockman et al., 2016], 'TRPO' [Schulman et al., 2015], and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use stochastic policy parametrized by three fully connected layers (100 hidden units and Tanh activation), and construct the value network by sharing the layers with the policy network. Both policy net and value net are optimized through gradient descend with Adam optimizer. ... In the experiment, we set α to 1 and relate β to the incompleteness ratio η (0, 1) of actions in demonstrations, β = 1 η.