Diffusion-Reward Adversarial Imitation Learning
Authors: Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Frank Wang, Min-Hung Chen, Shao-Hua Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted in navigation, manipulation, and locomotion, verifying DRAIL s effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more robust and smoother rewards. |
| Researcher Affiliation | Collaboration | Chun-Mao Lai1 Hsiang-Chun Wang1 Ping-Chun Hsieh2 Yu-Chiang Frank Wang1,3 Min-Hung Chen3 Shao-Hua Sun1 1National Taiwan University 2National Yang Ming Chiao Tung University 3NVIDIA |
| Pseudocode | Yes | Algorithm 1 Diffusion-Reward Adversarial Imitation Learning (DRAIL) |
| Open Source Code | No | We plan to release the codes, models, and expert datasets upon acceptance. |
| Open Datasets | Yes | We use the expert dataset provided by Lee et al. [33], which includes 100 demonstrations, comprising 18,525 transitions. We use the demonstrations from Lee et al. [33], consisting of 20,311 transitions (664 trajectories). We use the demonstrations collected by Lee et al. [33], which contain 515 trajectories (10k transitions). |
| Dataset Splits | No | The paper specifies training and testing, and mentions using five different random seeds for training, but it does not explicitly detail a separate validation set split or how it was used beyond general evaluation. |
| Hardware Specification | Yes | Machine 1 & Machine 2: ASUS WS880T workstation CPU: an Intel Xeon W-2255 (10C/20T, 19.25M, 4.5GHz) 48-Lane CPU GPUs: an NVIDIA RTX 3080 Ti GPU and an NVIDIA RTX 3090 GPU Memory: 128GB memory Machine 3: ASUS WS880T workstation CPU: an Intel Xeon W-2255 (10C/20T, 19.25M, 4.5GHz) 48-Lane CPU GPUs: two NVIDIA RTX 3080 Ti GPUs Memory: 128GB memory |
| Software Dependencies | No | The Adam optimizer [27] is utilized for all methods, with the exception of the discriminator in WAIL, for which RMSProp is employed. Our condition diffusion model is implemented using the diffusers package by von Platen et al. [57]. |
| Experiment Setup | Yes | The hyperparameters employed for all methods across various tasks are outlined in Table 3. The PPO hyperparameters for each task are presented in Table 4. |