f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
Authors: Xin Zhang, Yanhua Li, Ziming Zhang, Zhi-Li Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with IL baselines with various predefined divergence measures, f-GAIL learns better policies with higher data efficiency in six physics-based control tasks. |
| Researcher Affiliation | Academia | Worcester Polytechnic Institute, USA , University of Minnesota, USA$ {xzhang17,yli15,zzhang15}@wpi.edu, zhzhang@cs.umn.edu |
| Pseudocode | Yes | Algorithm 1 f-GAIL |
| Open Source Code | Yes | The code for reproducing the experiments are available at https: //github.com/f GAIL3456/f GAIL. |
| Open Datasets | Yes | six physics-based control tasks, including the Cart Pole [8] from the classic RL literature, and five complex tasks simulated with Mu Jo Co [32], such as Half Cheetah, Hopper, Reacher, Walker, and Humanoid. |
| Dataset Splits | Yes | A set of expert state-action pairs is split into 70% training data and 30% validation data. The policy is trained with supervised learning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) are provided. |
| Software Dependencies | No | The paper mentions software components like Open AI Gym, Mu Jo Co, Adam, and TRPO, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | For fair comparisons, the policy network structures πθ of all the baselines and f-GAIL are the same in all experiments, with two hidden layers of 100 units each, and tanh nonlinearlities in between. The implementations of reward signal networks and discriminators vary according to baseline architectures, and we delegate these implementation details to Appendix B. All networks were always initialized randomly at the start of each trial. |