A Bayesian Approach to Generative Adversarial Imitation Learning
Authors: Wonseok Jeon, Seokin Seo, Kee-Eung Kim
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our BGAIL on five continuous control tasks (Hopper-v1, Walker2d-v1, Half Cheetahv1, Ant-v1, Humanoid-v1) from Open AI Gym, implemented with the Mu Jo Co physics simulator [Todorov et al., 2012]. The imitation performances of vanilla GAIL, tuned GAIL and our algorithm are summarized in Table 1. |
| Researcher Affiliation | Collaboration | Wonseok Jeon1, Seokin Seo1, Kee-Eung Kim1,2 1 School of Computing, KAIST, Republic of Korea 2 PROWLER.io {wsjeon, siseo}@ai.kaist.ac.kr, kekim@cs.kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Bayesian Generative Adversarial Imitation Learning (BGAIL) |
| Open Source Code | No | The paper states "our code was built on the GAIL implementation in Open AI Baselines" and "Our SVGD was implemented using the code released by the authors2", with footnote 2 linking to a SVGD repository. However, it does not explicitly state that the BGAIL code developed in this paper is publicly available. |
| Open Datasets | Yes | expert s trajectories were collected from the expert policy released by the authors of the original GAIL1, but our code was built on the GAIL implementation in Open AI Baselines [Dhariwal et al., 2017] which uses Tensor Flow [Abadi et al., 2016]. For the policy, Gaussian policy was used with both mean and variance dependent on the observation. Footnote 1: https://github.com/openai/imitation |
| Dataset Splits | No | The paper describes evaluation procedures using "50 independent trajectories" and training for "5 trained policies" and "number of training iterations," but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions using the MuJoCo physics simulator and TensorFlow but does not provide any specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions software like Open AI Gym, MuJoCo, Open AI Baselines, TensorFlow, Adam optimizer, and SVGD, but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For all tasks, neural networks with 2 hidden layers were used for all policy and discriminator networks, where 100 hidden units for each hidden layer and tanh activations are used. ... For the discriminator, the number of particles K was chosen to be 5. ... For training, we used uninformative prior and SVGD along with the Adam optimizer [Kingma and Ba, 2014] ... In addition, 5 inner loops were used for updating discriminator parameters, which corresponds to the inner loop from line 6 to line 11 in Algorithm 1. |