DiffAIL: Diffusion Adversarial Imitation Learning
Authors: Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings. |
| Researcher Affiliation | Academia | Bingzheng Wang, Guoqiang Wu*, Teng Pang, Yan Zhang, Yilong Yin* Shandong University binzhwang@gmail.com, guoqiangwu@sdu.edu.cn , silencept7@gmail.com, yannzhang9@gmail.com, ylyin@sdu.edu.cn |
| Pseudocode | Yes | Algorithm 1: Diffusion Adversarial Imitation Learning |
| Open Source Code | No | No explicit statement about releasing source code or a link to a repository is provided in the paper. |
| Open Datasets | Yes | In our experiments, we compare Diff AIL with the SOTA algorithm in the Mujoco environments. We choose four representative tasks, namely Hopper, Half Cheetah, Walker2d, and Ant. We have 40 trajectories on each task dataset and randomly subsample n = [1, 4, 16] trajectories from the pool of 40 trajectories. |
| Dataset Splits | No | The paper does not explicitly describe standard train/validation/test dataset splits with percentages, counts, or citations to predefined splits. It mentions subsampling trajectories for training and testing on 'held-out' trajectories for a specific experiment, but not a general dataset partitioning methodology. |
| Hardware Specification | No | No specific hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for running experiments is mentioned in the paper. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Surprisingly in both settings, we share the hyperparameters instead of readjusting them. [...] Experiments use five random seeds (0, 1, 2, 3, 4), run for an equal number of steps per task, with 10 thousand steps per epoch. [...] Therefore, we finally select t = 10 as the diffusion step to balance discriminator quality and training cost. |