DiffuserLite: Towards Real-time Diffusion Planning
Authors: Zibin Dong, Jianye Hao, Yifu Yuan, Fei Ni, Yitian Wang, Pengyi Li, YAN ZHENG
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that Diffuser Lite achieves a decision-making frequency of 122.2Hz (112.7x faster than predominant frameworks) and reaches state-of-the-art performance on D4RL, Robomimic, and Fin RL benchmarks. |
| Researcher Affiliation | Academia | Zibin Dong1 Jianye Hao1 Yifu Yuan1 Fei Ni1 Yitian Wang2 Pengyi Li1 Yan Zheng1 1College of Intelligence and Computing, Tianjin University 2UC San Diego Jacobs School of Engineering |
| Pseudocode | Yes | We present the architecture overview in fig. 3, provide pseudocode for both training and inference in algorithm 1 and algorithm 2, and discuss detailed design choices in this section. |
| Open Source Code | Yes | The code and model checkpoints have been released. |
| Open Datasets | Yes | We evaluate the algorithm on various offline RL domains, including locomotion in Gym-Mu Jo Co [4], real-world manipulation in Franka Kitchen [13] and Robomimic [34], long-horizon navigation in Antmaze [12], and real-world stock trading in Fin RL [41]. We train all models using publicly available datasets (see appendix A.1 for further details). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about a validation dataset split or how it was used. |
| Hardware Specification | Yes | All runtime results across our experiments are obtained on a server equipped with an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz and an NVIDIA Ge Force RTX3090. |
| Software Dependencies | No | The paper mentions software components like 'Di T', 'Adam W optimizer', 'Mish activation', and 'Layer Norm' but does not specify their version numbers. |
| Experiment Setup | Yes | We utilize Di T [40] as the neural network backbone for all diffusion models and rectified flows, with an embedding dimension of 256, 8 attention heads, and 2 Di T blocks. Across all the experiments, we employ Diffuser Lite with 3 levels. In Kitchen, we utilize a planning horizon of 49 with temporal jumps for each level set to 16, 4, and 1, respectively. In Mu Jo Co and Antmaze, we use a planning horizon of 129 with temporal jumps of 32, 8, and 1 for each level, respectively. For diffusion models, we use cosine noise schedule [37] for αs and σs with diffusion steps T = 1000. We employ DDIM [45] to sample trajectories. In Mu Jo Co and Kitchen, we use 3 sampling steps, while in Antmaze, we use 5 sampling steps. For rectified flows, we use the Euler solver with 3 steps for all benchmarks. All models utilize the Adam W optimizer [31] with a learning rate of 2e 4 and weight decay of 1e 5. We perform 500K gradient updates with a batch size of 256. For conditional sampling, we tune the guidance strength w within the range of [0, 1]. |