C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
Authors: Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, the C-GAIL regularizer improves the training of various existing GAIL methods, including the popular GAIL-DAC, by speeding up the convergence, reducing the range of oscillation, and matching the expert distribution more closely. |
| Researcher Affiliation | Collaboration | 1Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing 100084, China 2Microsoft Research |
| Pseudocode | Yes | Algorithm 1 The C-GAIL algorithm |
| Open Source Code | Yes | Additionally, we submit an additional zip file to reproduce our experimental results. |
| Open Datasets | Yes | We test five Mu Ju Co environments: Half-Cheetah, Ant, Hopper, Reacher and Walker 2D. |
| Dataset Splits | No | The paper states: "We assess the normalized return over training for GAIL-DAC and C-GAIL-DAC to evaluate their speed of convergence and stability, reporting the mean and standard deviation over five random seeds." This describes evaluation metrics during training, but does not specify a distinct validation dataset split. |
| Hardware Specification | Yes | Our experiments are conducted on a single NVIDIA Ge Force GTX TITAN X. |
| Software Dependencies | No | The paper mentions that "The networks are optimized using Adam with a learning rate of 10 3, decayed by 0.5 every 105 gradient steps." However, it does not provide specific version numbers for Adam or any other software libraries or frameworks used. |
| Experiment Setup | Yes | The discriminator architecture has a two-layer MLP with 100 hidden units and tanh activations. The networks are optimized using Adam with a learning rate of 10 3, decayed by 0.5 every 105 gradient steps. We vary the number of provided expert demonstrations: {4, 7, 11, 15, 18}, though unless stated we report results using four demonstrations. |