Diffusion-Reward Adversarial Imitation Learning

Authors: Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Frank Wang, Min-Hung Chen, Shao-Hua Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted in navigation, manipulation, and locomotion, verifying DRAIL s effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more robust and smoother rewards.
Researcher Affiliation Collaboration Chun-Mao Lai1 Hsiang-Chun Wang1 Ping-Chun Hsieh2 Yu-Chiang Frank Wang1,3 Min-Hung Chen3 Shao-Hua Sun1 1National Taiwan University 2National Yang Ming Chiao Tung University 3NVIDIA
Pseudocode Yes Algorithm 1 Diffusion-Reward Adversarial Imitation Learning (DRAIL)
Open Source Code No We plan to release the codes, models, and expert datasets upon acceptance.
Open Datasets Yes We use the expert dataset provided by Lee et al. [33], which includes 100 demonstrations, comprising 18,525 transitions. We use the demonstrations from Lee et al. [33], consisting of 20,311 transitions (664 trajectories). We use the demonstrations collected by Lee et al. [33], which contain 515 trajectories (10k transitions).
Dataset Splits No The paper specifies training and testing, and mentions using five different random seeds for training, but it does not explicitly detail a separate validation set split or how it was used beyond general evaluation.
Hardware Specification Yes Machine 1 & Machine 2: ASUS WS880T workstation CPU: an Intel Xeon W-2255 (10C/20T, 19.25M, 4.5GHz) 48-Lane CPU GPUs: an NVIDIA RTX 3080 Ti GPU and an NVIDIA RTX 3090 GPU Memory: 128GB memory Machine 3: ASUS WS880T workstation CPU: an Intel Xeon W-2255 (10C/20T, 19.25M, 4.5GHz) 48-Lane CPU GPUs: two NVIDIA RTX 3080 Ti GPUs Memory: 128GB memory
Software Dependencies No The Adam optimizer [27] is utilized for all methods, with the exception of the discriminator in WAIL, for which RMSProp is employed. Our condition diffusion model is implemented using the diffusers package by von Platen et al. [57].
Experiment Setup Yes The hyperparameters employed for all methods across various tasks are outlined in Table 3. The PPO hyperparameters for each task are presented in Table 4.