Adaptive Online Replanning with Diffusion Models
Authors: Siyuan Zhou, Yilun Du, Shun Zhang, Mengdi Xu, Yikang Shen, Wei Xiao, Dit-Yan Yeung, Chuang Gan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate that RDM obtains strong empirical performance across a set of different benchmarks, providing a 38% boost over past diffusion planning approaches on Maze2D and further enabling effective handling of stochastic and long-horizon robotic control tasks. In this section, we empirically use our proposed RDM algorithm for replanning in multiple decision-making tasks (as illustrated in Figure 2) and evaluate the performance of the generated trajectories. |
| Researcher Affiliation | Collaboration | Siyuan Zhou1, Yilun Du2, Shun Zhang3, Mengdi Xu4, Yikang Shen3, Wei Xiao2, Dit-Yan Yeung1, Chuang Gan3,5 1Hong Kong University of Science and Technology 2Massachusetts Institute of Technology, 3MIT-IBM Watson AI Lab 4Carnegie Mellon University, 5UMass Amherst |
| Pseudocode | Yes | Algorithm 1 When to Replan. Algorithm 2 Replanning from scratch Algorithm 3 Replanning with future |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the methodology's code. |
| Open Datasets | Yes | We consider a stochastic variation of the D4RL locomotion benchmark [10] by adding randomness to the transition function. We consider the RLBench domain [15], which consists of 74 challenging vision-based robotic learning tasks. |
| Dataset Splits | No | The paper describes the use of various datasets (Maze2D, D4RL, RLBench) but does not provide specific numerical train/validation/test split percentages or counts for reproduction. |
| Hardware Specification | Yes | We perform the whole experiment with a total of three Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software components and techniques such as Adam optimizer, CLIP, Group Norm, and Mish nonlinearity, but it does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | The model is trained using Adam optimizer [20] with a learning rate of 2e 04 and a batch size of 64 for 1e6 training steps. The planning horizon is set to 128 in Maze2D//Multi2D U-Maze, 256 in Maze2D//Multi2D Medium, 256 in Maze2D//Multi2D Large, 64 in Stochastic Environments, and 64 in RLBench. We use a threshold of 0.7 for Replan from scratch and a threshold of 0.5 for Replan with future. The probability ϵ of random actions is set to 0.03 in Stochastic Environments. The diffusion steps i for computing likelihood is set to {5, 10, 15} in Maze2D and Stochastic Environments and {10, 20, 30} in RLBench. The total number of diffusion steps, corresponding to the number of diffusion steps for Replan from scratch is set to 256 in Maze2D, 200 in Stochastic Environments, and 400 in RLBench. And the number of diffusion steps for Replan with future is set to 80 in Maze2D, 40 in Stochastic Environments, and 100 in RLBench. |