Variational Distillation of Diffusion Policies into Mixture of Experts
Authors: Hongyi Zhou, Denis Blessing, Ge Li, Onur Celik, Xiaogang Jia, Gerhard Neumann, Rudolf Lioutikov
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training Mo E. The thorough experimental evaluation on nine sophisticated behavior learning tasks show that VDD i) accurately distills complex distributions, ii) outperforms existing SOTA distillation methods and iii) surpasses conventional Mo E training methods. |
| Researcher Affiliation | Academia | Hongyi Zhou i Denis Blessing Ge Li Onur Celik Xiaogang Jia Gerhard Neumann Rudolf Lioutikov Intuitive Robots Lab, Karlsruhe Institute of Technology Autonomous Learning Robots, Karlsruhe Institute of Technology FZI Research Center for Information Technology |
| Pseudocode | Yes | Algorithm 1 VDD training |
| Open Source Code | Yes | The code and videos are available at https://intuitive-robots.github.io/vdd-website. We will open source the codes in the near future once they are cleaned up and anomnymity is not a concern anymore. |
| Open Datasets | Yes | We conducted imitation learning experiments by distilling two types of diffusion models: variance preserving (VP) [2, 12] and variance exploding (VE) [65, 4]. We selected DDPM as the representative for VP and BESO as the representative for VE. Relay Kitchen [67] and XArm Block Push [68]. D3IL [13] is a simulation benchmark with diverse human demonstrations |
| Dataset Splits | No | The paper mentions training data and test data but does not explicitly detail the split percentages for training, validation, and testing sets. |
| Hardware Specification | Yes | The predictions were conducted using the same system (RTX 3070 GPU, Intel i7-12700 CPU). We conducted the evaluation on the state-based avoiding task, using a machine with an RTX 3070 GPU and an i7-13700 CPU. Each node contains 4 NVIDIA A100 and we use one GPU for each method. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Detailed descriptions regarding the baselines implementation and hyperparameters selection can be found in Appendix D and E. We executed a large-scale grid search to fine-tune key hyperparameters for each baseline method. For other hyperparameters, we choose the value specified in their respective original papers. Below is a list summarizing the key hyperparameters that we swept during the experiment phase. Methods / Parameters Grid Search Avoiding Aligning Pushing Stacking Sorting-Vision Stacking-Vision Kitchen Block Push GPT (shared by all) Number of Layers 4 4 4 4 6 6 6 4 Number of Attention Heads 4 4 4 4 6 6 12 12 Embedding Dimension 72 72 72 72 120 120 240 192 Window Size 5 5 5 5 5 5 4 5 Optimizer Adam Adam Adam Adam Adam Adam Adam Adam Learning Rate 10 4 1 1 1 1 1 1 1 1 |