Diffusion Imitation from Observation
Authors: Bo-Ruei Huang, Chun-Kai Yang, Chun-Mao Lai, Dai-Jie Wu, Shao-Hua Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our method DIFO to various existing Lf O methods in various continuous control domains, including navigation, locomotion, manipulation, and games. The experimental results show that DIFO consistently exhibits superior performance. |
| Researcher Affiliation | Academia | Bo-Ruei Huang Chun-Kai Yang Chun-Mao Lai Dai-Jie Wu Shao-Hua Sun Department of Electrical Engineering, National Taiwan University |
| Pseudocode | Yes | A Pseudocode of DIFO Algorithm 1 Diffusion Imitation from Observation (DIFO) |
| Open Source Code | No | Answer: [No] Justification: We plan to release code, expert datasets, and models recently. |
| Open Datasets | Yes | We collect 60 demonstrations (36 000 transitions) using a controller from Fu et al. [14]. We use 100 demonstrations (7000 transitions) from Minari [50]. Table 2: Expert observations. Detailed information on collected expert observations in each Task. POINTMAZE 60 36 000 D4RL [14] ANTMAZE 100 7000 Minari [50] |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It mentions collecting expert demonstrations and performing online interactions, but not how these are formally split for training, validation, or testing. |
| Hardware Specification | Yes | Table 6: Computational resources. Workstation 1 Intel Xeon w7-2475X NVIDIA Ge Force RTX 4090 x 2 125 Gi B Workstation 2 Intel Xeon w5-2455X NVIDIA RTX A6000 x 2 125 Gi B Workstation 3 Intel Xeon W-2255 NVIDIA Ge Force RTX 4070 Ti x 2 125 Gi B Workstation 4 Intel Xeon W-2255 NVIDIA Ge Force RTX 4070 Ti x 2 125 Gi B |
| Software Dependencies | No | The paper mentions key software components such as 'Imitation', 'Stable Baselines3', and the 'diffusers package', and 'Adam' as optimizer, but does not provide specific version numbers for these components required for a reproducible setup. |
| Experiment Setup | Yes | Table 4: Hyperparameters. The overview of the hyperparameters used for all the methods in every task. We abbreviate 'Discriminator' as 'Disc.' in this table. Table 5: SAC & PPO training parameters. |