Generative Adversarial Imitation Learning
Authors: Jonathan Ho, Stefano Ermon
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our algorithm in Section 6, where we find that it outperforms competing methods by a wide margin in training policies for complex, high-dimensional physics-based control tasks over various amounts of expert data. We evaluated GAIL against baselines on 9 physics-based control tasks... |
| Researcher Affiliation | Collaboration | Jonathan Ho Open AI hoj@openai.com Stefano Ermon Stanford University ermon@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Generative adversarial imitation learning |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | Each task comes with a true cost function, defined in the Open AI Gym [5]. |
| Dataset Splits | Yes | Behavioral cloning: a given dataset of state-action pairs is split into 70% training data and 30% validation data. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like "Open AI Gym", "Adam", "TRPO", and "Mu Jo Co", but it does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | We used all algorithms to train policies of the same neural network architecture for all tasks: two hidden layers of 100 units each, with tanh nonlinearities in between. The discriminator networks for GAIL also used the same architecture. All networks were always initialized randomly at the start of each trial. For each task, we gave FEM, GTAL, and GAIL exactly the same amount of environment interaction for training. We ran all algorithms 5-7 times over different random seeds in all environments except Humanoid, due to time restrictions. |