A Bayesian Approach to Generative Adversarial Imitation Learning

Authors: Wonseok Jeon, Seokin Seo, Kee-Eung Kim

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our BGAIL on five continuous control tasks (Hopper-v1, Walker2d-v1, Half Cheetahv1, Ant-v1, Humanoid-v1) from Open AI Gym, implemented with the Mu Jo Co physics simulator [Todorov et al., 2012]. The imitation performances of vanilla GAIL, tuned GAIL and our algorithm are summarized in Table 1.
Researcher Affiliation Collaboration Wonseok Jeon1, Seokin Seo1, Kee-Eung Kim1,2 1 School of Computing, KAIST, Republic of Korea 2 PROWLER.io {wsjeon, siseo}@ai.kaist.ac.kr, kekim@cs.kaist.ac.kr
Pseudocode Yes Algorithm 1 Bayesian Generative Adversarial Imitation Learning (BGAIL)
Open Source Code No The paper states "our code was built on the GAIL implementation in Open AI Baselines" and "Our SVGD was implemented using the code released by the authors2", with footnote 2 linking to a SVGD repository. However, it does not explicitly state that the BGAIL code developed in this paper is publicly available.
Open Datasets Yes expert s trajectories were collected from the expert policy released by the authors of the original GAIL1, but our code was built on the GAIL implementation in Open AI Baselines [Dhariwal et al., 2017] which uses Tensor Flow [Abadi et al., 2016]. For the policy, Gaussian policy was used with both mean and variance dependent on the observation. Footnote 1: https://github.com/openai/imitation
Dataset Splits No The paper describes evaluation procedures using "50 independent trajectories" and training for "5 trained policies" and "number of training iterations," but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification No The paper mentions using the MuJoCo physics simulator and TensorFlow but does not provide any specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions software like Open AI Gym, MuJoCo, Open AI Baselines, TensorFlow, Adam optimizer, and SVGD, but it does not provide specific version numbers for these software components.
Experiment Setup Yes For all tasks, neural networks with 2 hidden layers were used for all policy and discriminator networks, where 100 hidden units for each hidden layer and tanh activations are used. ... For the discriminator, the number of particles K was chosen to be 5. ... For training, we used uninformative prior and SVGD along with the Adam optimizer [Kingma and Ba, 2014] ... In addition, 5 inner loops were used for updating discriminator parameters, which corresponds to the inner loop from line 6 to line 11 in Algorithm 1.