Diffusion Imitation from Observation

Authors: Bo-Ruei Huang, Chun-Kai Yang, Chun-Mao Lai, Dai-Jie Wu, Shao-Hua Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our method DIFO to various existing Lf O methods in various continuous control domains, including navigation, locomotion, manipulation, and games. The experimental results show that DIFO consistently exhibits superior performance.
Researcher Affiliation Academia Bo-Ruei Huang Chun-Kai Yang Chun-Mao Lai Dai-Jie Wu Shao-Hua Sun Department of Electrical Engineering, National Taiwan University
Pseudocode Yes A Pseudocode of DIFO Algorithm 1 Diffusion Imitation from Observation (DIFO)
Open Source Code No Answer: [No] Justification: We plan to release code, expert datasets, and models recently.
Open Datasets Yes We collect 60 demonstrations (36 000 transitions) using a controller from Fu et al. [14]. We use 100 demonstrations (7000 transitions) from Minari [50]. Table 2: Expert observations. Detailed information on collected expert observations in each Task. POINTMAZE 60 36 000 D4RL [14] ANTMAZE 100 7000 Minari [50]
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It mentions collecting expert demonstrations and performing online interactions, but not how these are formally split for training, validation, or testing.
Hardware Specification Yes Table 6: Computational resources. Workstation 1 Intel Xeon w7-2475X NVIDIA Ge Force RTX 4090 x 2 125 Gi B Workstation 2 Intel Xeon w5-2455X NVIDIA RTX A6000 x 2 125 Gi B Workstation 3 Intel Xeon W-2255 NVIDIA Ge Force RTX 4070 Ti x 2 125 Gi B Workstation 4 Intel Xeon W-2255 NVIDIA Ge Force RTX 4070 Ti x 2 125 Gi B
Software Dependencies No The paper mentions key software components such as 'Imitation', 'Stable Baselines3', and the 'diffusers package', and 'Adam' as optimizer, but does not provide specific version numbers for these components required for a reproducible setup.
Experiment Setup Yes Table 4: Hyperparameters. The overview of the hyperparameters used for all the methods in every task. We abbreviate 'Discriminator' as 'Disc.' in this table. Table 5: SAC & PPO training parameters.