reproducibilityindex.ai

Adversarial Imitation Learning from Incomplete Demonstrations

Authors: Mingfei Sun, Xiaojuan Ma

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare AGAIL to other methods on benchmark tasks and show that AGAIL consistently delivers comparable performance to the state-of-the-art methods even when the action sequence in demonstrations is only partially available. ... Through various experiments on different levels of incompleteness of actions in demonstrations, we show that AGAIL consistently delivers comparable performance to two state-of-the-art algorithms even when the demonstrations provided are incomplete.
Researcher Affiliation	Academia	Mingfei Sun and Xiaojuan Ma Department of Computer Science and Engineering, Hong Kong University of Science and Technology mingfei.sun@ust.hk, mxj@cse.ust.hk
Pseudocode	Yes	Algorithm 1 Action-guided adversarial imitation learning
Open Source Code	No	1See project page: https://mingfeisun.github.io/agail/ (The project page states: 'Code for AGAIL will be released soon!')
Open Datasets	Yes	Four simulation tasks, Cart Pole, Hopper, Walker and Humanoid (from low-dimensional to high-dimensional controls), are selected to cover discrete and continuous state/action space, and the speciﬁcations are listed in Table 1. ... All algorithms1 are implemented based on the work [Brockman et al., 2016].
Dataset Splits	No	The paper describes collecting demonstrations and masking actions, but does not specify explicit training/test/validation dataset splits or percentages for reproducing the experiment.
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, or memory) are provided.
Software Dependencies	No	The paper mentions 'OpenAI Gym' [Brockman et al., 2016], 'TRPO' [Schulman et al., 2015], and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use stochastic policy parametrized by three fully connected layers (100 hidden units and Tanh activation), and construct the value network by sharing the layers with the policy network. Both policy net and value net are optimized through gradient descend with Adam optimizer. ... In the experiment, we set α to 1 and relate β to the incompleteness ratio η (0, 1) of actions in demonstrations, β = 1 η.