C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Authors: Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, the C-GAIL regularizer improves the training of various existing GAIL methods, including the popular GAIL-DAC, by speeding up the convergence, reducing the range of oscillation, and matching the expert distribution more closely.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing 100084, China 2Microsoft Research
Pseudocode Yes Algorithm 1 The C-GAIL algorithm
Open Source Code Yes Additionally, we submit an additional zip file to reproduce our experimental results.
Open Datasets Yes We test five Mu Ju Co environments: Half-Cheetah, Ant, Hopper, Reacher and Walker 2D.
Dataset Splits No The paper states: "We assess the normalized return over training for GAIL-DAC and C-GAIL-DAC to evaluate their speed of convergence and stability, reporting the mean and standard deviation over five random seeds." This describes evaluation metrics during training, but does not specify a distinct validation dataset split.
Hardware Specification Yes Our experiments are conducted on a single NVIDIA Ge Force GTX TITAN X.
Software Dependencies No The paper mentions that "The networks are optimized using Adam with a learning rate of 10 3, decayed by 0.5 every 105 gradient steps." However, it does not provide specific version numbers for Adam or any other software libraries or frameworks used.
Experiment Setup Yes The discriminator architecture has a two-layer MLP with 100 hidden units and tanh activations. The networks are optimized using Adam with a learning rate of 10 3, decayed by 0.5 every 105 gradient steps. We vary the number of provided expert demonstrations: {4, 7, 11, 15, 18}, though unless stated we report results using four demonstrations.