reproducibilityindex.ai

DiffAIL: Diffusion Adversarial Imitation Learning

Authors: Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings.
Researcher Affiliation	Academia	Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin Shandong University binzhwang@gmail.com, guoqiangwu@sdu.edu.cn , silencept7@gmail.com, yannzhang9@gmail.com, ylyin@sdu.edu.cn
Pseudocode	Yes	Algorithm 1: Diffusion Adversarial Imitation Learning
Open Source Code	No	No explicit statement about releasing source code or a link to a repository is provided in the paper.
Open Datasets	Yes	In our experiments, we compare Diff AIL with the SOTA algorithm in the Mujoco environments. We choose four representative tasks, namely Hopper, Half Cheetah, Walker2d, and Ant. We have 40 trajectories on each task dataset and randomly subsample n = [1, 4, 16] trajectories from the pool of 40 trajectories.
Dataset Splits	No	The paper does not explicitly describe standard train/validation/test dataset splits with percentages, counts, or citations to predefined splits. It mentions subsampling trajectories for training and testing on 'held-out' trajectories for a specific experiment, but not a general dataset partitioning methodology.
Hardware Specification	No	No specific hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for running experiments is mentioned in the paper.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Surprisingly in both settings, we share the hyperparameters instead of readjusting them. [...] Experiments use five random seeds (0, 1, 2, 3, 4), run for an equal number of steps per task, with 10 thousand steps per epoch. [...] Therefore, we finally select t = 10 as the diffusion step to balance discriminator quality and training cost.