Rolling Diffusion Models
Authors: David Ruhe, Jonathan Heek, Tim Salimans, Emiel Hoogeboom
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that when the temporal dynamics are complex, Rolling Diffusion is superior to standard diffusion. In particular, this result is demonstrated in a video prediction task using the Kinetics-600 video dataset and in a chaotic fluid dynamics forecasting experiment. |
| Researcher Affiliation | Collaboration | 1Google Deepmind, Amsterdam, Netherlands 2University of Amsterdam, Netherlands. |
| Pseudocode | Yes | Algorithm 1 Rolling Diffusion: Training; Algorithm 2 Rolling Diffusion: Rollout |
| Open Source Code | No | The paper does not include an unambiguous statement about releasing code for the described methodology or provide a direct link to a source-code repository. |
| Open Datasets | Yes | video prediction task using the Kinetics-600 video dataset (Kay et al., 2017) and in an experiment involving chaotic fluid mechanics simulations. ... BAIR robot pushing dataset (Ebert et al., 2017) is a standard benchmark for video prediction. |
| Dataset Splits | No | The paper mentions datasets like BAIR and Kinetics-600, which are standard benchmarks, and refers to 'evaluation sets' or 'test-time' setups. However, it does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits by name) for reproduction. |
| Hardware Specification | No | The paper does not specify the exact hardware used for experiments, such as specific GPU or CPU models, or detailed cloud computing resources. |
| Software Dependencies | No | The paper refers to frameworks like Jax CFD and Simple Diffusion architecture but does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | Appendix C. Hyperparameters. Throughout the experiments we use U-Vi Ts which are essentially U-Nets with MLP Blocks instead of convolutional layers when self-attention is used in a block. ... Parameter Value Blocks [3 + 3, 3 + 3, 3 + 3, 8] Channels [128, 256, 512, 1024] Head Dim 128 Dropout [0, 0.1, 0.1, 0.1] ... learning rate 1e-4 |