DiffAIL: Diffusion Adversarial Imitation Learning

Authors: Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings.
Researcher Affiliation Academia Bingzheng Wang, Guoqiang Wu*, Teng Pang, Yan Zhang, Yilong Yin* Shandong University binzhwang@gmail.com, guoqiangwu@sdu.edu.cn , silencept7@gmail.com, yannzhang9@gmail.com, ylyin@sdu.edu.cn
Pseudocode Yes Algorithm 1: Diffusion Adversarial Imitation Learning
Open Source Code No No explicit statement about releasing source code or a link to a repository is provided in the paper.
Open Datasets Yes In our experiments, we compare Diff AIL with the SOTA algorithm in the Mujoco environments. We choose four representative tasks, namely Hopper, Half Cheetah, Walker2d, and Ant. We have 40 trajectories on each task dataset and randomly subsample n = [1, 4, 16] trajectories from the pool of 40 trajectories.
Dataset Splits No The paper does not explicitly describe standard train/validation/test dataset splits with percentages, counts, or citations to predefined splits. It mentions subsampling trajectories for training and testing on 'held-out' trajectories for a specific experiment, but not a general dataset partitioning methodology.
Hardware Specification No No specific hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for running experiments is mentioned in the paper.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Surprisingly in both settings, we share the hyperparameters instead of readjusting them. [...] Experiments use five random seeds (0, 1, 2, 3, 4), run for an equal number of steps per task, with 10 thousand steps per epoch. [...] Therefore, we finally select t = 10 as the diffusion step to balance discriminator quality and training cost.