Model-based Diffusion for Trajectory Optimization

Authors: Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks.
Researcher Affiliation Academia Chaoyi Pan , Zeji Yi , Guanya Shi , Guannan Qu Carnegie Mellon University {chaoyip,zejiy,guanyas,gqu}@andrew.cmu.edu
Pseudocode Yes Algorithm 1 Model-based Diffusion for Generic Optimization Algorithm 2 Model-based Diffusion for Trajectory Optimization
Open Source Code Yes Videos and codes: https://lecar-lab.github.io/mbd/
Open Datasets Yes For Humanoid Jogging, we use data from the CMU Mocap dataset [1], from which we extract torso, thigh, and shin positions and use them as a partial state reference.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It describes the environments and tasks used for evaluation, but not how a specific dataset was partitioned for training and validation purposes.
Hardware Specification Yes All the experiments were conducted on a single NVIDIA RTX 4070 Ti GPU. For the BO benchmarks, the experiments were conducted on an A100 GPU because of the high computational demands of the Gaussian Process Regression Model it incorporates.
Software Dependencies No The paper mentions several software tools and frameworks like Google Brax, PPO, SAC, CMA-ES, CEM, MPPI, pycma, and Nevergrad. However, it does not provide specific version numbers for these software components, which are required for reproducible ancillary software description.
Experiment Setup Yes We use the same hyperparameters for all the tasks, with small tweaks for harder tasks. Task Name Horizon Sample Number Temperature λ (Table 4) For diffusion noise schedulling, we use simple linear scheduling β0 = 1 10 4 and βN = 1 10 2, and the diffusion step number is 100 across all tasks. For reinforcement learning implementation, we strictly follow the hyperparameters and implementation details provided by the original Brax repository, which optimize for the best performance. The hyperparameters for the RL tasks are shown in Table 5 and Table 6.