Planning with Diffusion for Flexible Behavior Synthesis
Authors: Michael Janner, Yilun Du, Joshua Tenenbaum, Sergey Levine
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Evaluation The focus of our experiments is to evaluate Diffuser on the capabilities we would like from a data-driven planner. In particular, we evaluate (1) the ability to plan over long horizons without manual reward shaping, (2) the ability to generalize to new configurations of goals unseen during training, and (3) the ability to recover an effective controller from heterogeneous data of varying quality. We conclude by studying practical runtime considerations of diffusion-based planning, including the most effective ways of speeding up the planning procedure while suffering minimally in terms of performance. Table 1. (Long-horizon planning) The performance of Diffuser and prior model-free algorithms in the Maze2D environment... Table 3. (Test-time flexibility) Performance of BCQ, CQL, and Diffuser on block stacking tasks. Table 2. (Offline reinforcement learning) The performance of Diffuser and a variety of prior algorithms on the D4RL locomotion benchmark... |
| Researcher Affiliation | Academia | 1University of California, Berkeley 2MIT. Correspondence to: janner@berkeley.edu, yilundu@mit.edu. |
| Pseudocode | Yes | Algorithm 1 Guided Diffusion Planning |
| Open Source Code | Yes | Code and visualizations of the learned denoising process are available at diffusion-planning.github.io. |
| Open Datasets | Yes | We evaluate long-horizon planning in the Maze2D environments (Fu et al., 2020)... Finally, we evaluate the capacity to recover an effective single-task controller from heterogeneous data of varying quality using the D4RL offline locomotion suite (Fu et al., 2020). |
| Dataset Splits | No | While the paper uses standard benchmark datasets (D4RL, Maze2D) which typically have predefined train/validation/test splits, the paper itself does not explicitly state the dataset splits (e.g., percentages or counts for training, validation, and test sets) for its own experiments in the main text. |
| Hardware Specification | No | The paper does not specify any particular CPU, GPU, or other hardware components used for running the experiments. It only mentions general computing resources in the acknowledgements: 'This work was partially supported by computational resource donations from Microsoft.' This is too vague. |
| Software Dependencies | Yes | We used the following open-source libraries for this work: Num Py (Harris et al., 2020), Py Torch (Paszke et al., 2019), and Diffusion Models in Py Torch (Wang, 2020). Adam optimizer (Kingma & Ba, 2015), group norm (Wu & He, 2018), and Mish nonlinearity (Misra, 2019). |
| Experiment Setup | Yes | We train the model using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 4e 05 and batch size of 32. We train the models for 500k steps. The return predictor J has the structure of the first half of the U-Net used for the diffusion model, with a final linear layer to produce a scalar output. We use a planning horizon T of 100 in all locomotion tasks, 128 for blocking stacking, 128 in Maze2D / Multi2D U-Maze, 265 in Maze2D / Multi2D Medium, and 384 in Maze2D / Multi2D Large. We use N = 100 diffusion steps. We use a guide scale of α = 0.001. |