DiffuserLite: Towards Real-time Diffusion Planning

Authors: Zibin Dong, Jianye Hao, Yifu Yuan, Fei Ni, Yitian Wang, Pengyi Li, YAN ZHENG

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that Diffuser Lite achieves a decision-making frequency of 122.2Hz (112.7x faster than predominant frameworks) and reaches state-of-the-art performance on D4RL, Robomimic, and Fin RL benchmarks.
Researcher Affiliation Academia Zibin Dong1 Jianye Hao1 Yifu Yuan1 Fei Ni1 Yitian Wang2 Pengyi Li1 Yan Zheng1 1College of Intelligence and Computing, Tianjin University 2UC San Diego Jacobs School of Engineering
Pseudocode Yes We present the architecture overview in fig. 3, provide pseudocode for both training and inference in algorithm 1 and algorithm 2, and discuss detailed design choices in this section.
Open Source Code Yes The code and model checkpoints have been released.
Open Datasets Yes We evaluate the algorithm on various offline RL domains, including locomotion in Gym-Mu Jo Co [4], real-world manipulation in Franka Kitchen [13] and Robomimic [34], long-horizon navigation in Antmaze [12], and real-world stock trading in Fin RL [41]. We train all models using publicly available datasets (see appendix A.1 for further details).
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about a validation dataset split or how it was used.
Hardware Specification Yes All runtime results across our experiments are obtained on a server equipped with an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz and an NVIDIA Ge Force RTX3090.
Software Dependencies No The paper mentions software components like 'Di T', 'Adam W optimizer', 'Mish activation', and 'Layer Norm' but does not specify their version numbers.
Experiment Setup Yes We utilize Di T [40] as the neural network backbone for all diffusion models and rectified flows, with an embedding dimension of 256, 8 attention heads, and 2 Di T blocks. Across all the experiments, we employ Diffuser Lite with 3 levels. In Kitchen, we utilize a planning horizon of 49 with temporal jumps for each level set to 16, 4, and 1, respectively. In Mu Jo Co and Antmaze, we use a planning horizon of 129 with temporal jumps of 32, 8, and 1 for each level, respectively. For diffusion models, we use cosine noise schedule [37] for αs and σs with diffusion steps T = 1000. We employ DDIM [45] to sample trajectories. In Mu Jo Co and Kitchen, we use 3 sampling steps, while in Antmaze, we use 5 sampling steps. For rectified flows, we use the Euler solver with 3 steps for all benchmarks. All models utilize the Adam W optimizer [31] with a learning rate of 2e 4 and weight decay of 1e 5. We perform 500K gradient updates with a batch size of 256. For conditional sampling, we tune the guidance strength w within the range of [0, 1].