f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Authors: Xin Zhang, Yanhua Li, Ziming Zhang, Zhi-Li Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared with IL baselines with various predefined divergence measures, f-GAIL learns better policies with higher data efficiency in six physics-based control tasks.
Researcher Affiliation Academia Worcester Polytechnic Institute, USA , University of Minnesota, USA$ {xzhang17,yli15,zzhang15}@wpi.edu, zhzhang@cs.umn.edu
Pseudocode Yes Algorithm 1 f-GAIL
Open Source Code Yes The code for reproducing the experiments are available at https: //github.com/f GAIL3456/f GAIL.
Open Datasets Yes six physics-based control tasks, including the Cart Pole [8] from the classic RL literature, and five complex tasks simulated with Mu Jo Co [32], such as Half Cheetah, Hopper, Reacher, Walker, and Humanoid.
Dataset Splits Yes A set of expert state-action pairs is split into 70% training data and 30% validation data. The policy is trained with supervised learning.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) are provided.
Software Dependencies No The paper mentions software components like Open AI Gym, Mu Jo Co, Adam, and TRPO, but does not provide specific version numbers for any of them.
Experiment Setup Yes For fair comparisons, the policy network structures πθ of all the baselines and f-GAIL are the same in all experiments, with two hidden layers of 100 units each, and tanh nonlinearlities in between. The implementations of reward signal networks and discriminators vary according to baseline architectures, and we delegate these implementation details to Appendix B. All networks were always initialized randomly at the start of each trial.