reproducibilityindex.ai

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Authors: Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, the C-GAIL regularizer improves the training of various existing GAIL methods, including the popular GAIL-DAC, by speeding up the convergence, reducing the range of oscillation, and matching the expert distribution more closely.
Researcher Affiliation	Collaboration	1Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing 100084, China 2Microsoft Research
Pseudocode	Yes	Algorithm 1 The C-GAIL algorithm
Open Source Code	Yes	Additionally, we submit an additional zip file to reproduce our experimental results.
Open Datasets	Yes	We test five Mu Ju Co environments: Half-Cheetah, Ant, Hopper, Reacher and Walker 2D.
Dataset Splits	No	The paper states: "We assess the normalized return over training for GAIL-DAC and C-GAIL-DAC to evaluate their speed of convergence and stability, reporting the mean and standard deviation over five random seeds." This describes evaluation metrics during training, but does not specify a distinct validation dataset split.
Hardware Specification	Yes	Our experiments are conducted on a single NVIDIA Ge Force GTX TITAN X.
Software Dependencies	No	The paper mentions that "The networks are optimized using Adam with a learning rate of 10 3, decayed by 0.5 every 105 gradient steps." However, it does not provide specific version numbers for Adam or any other software libraries or frameworks used.
Experiment Setup	Yes	The discriminator architecture has a two-layer MLP with 100 hidden units and tanh activations. The networks are optimized using Adam with a learning rate of 10 3, decayed by 0.5 every 105 gradient steps. We vary the number of provided expert demonstrations: {4, 7, 11, 15, 18}, though unless stated we report results using four demonstrations.