Rolling Diffusion Models

Authors: David Ruhe, Jonathan Heek, Tim Salimans, Emiel Hoogeboom

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that when the temporal dynamics are complex, Rolling Diffusion is superior to standard diffusion. In particular, this result is demonstrated in a video prediction task using the Kinetics-600 video dataset and in a chaotic fluid dynamics forecasting experiment.
Researcher Affiliation Collaboration 1Google Deepmind, Amsterdam, Netherlands 2University of Amsterdam, Netherlands.
Pseudocode Yes Algorithm 1 Rolling Diffusion: Training; Algorithm 2 Rolling Diffusion: Rollout
Open Source Code No The paper does not include an unambiguous statement about releasing code for the described methodology or provide a direct link to a source-code repository.
Open Datasets Yes video prediction task using the Kinetics-600 video dataset (Kay et al., 2017) and in an experiment involving chaotic fluid mechanics simulations. ... BAIR robot pushing dataset (Ebert et al., 2017) is a standard benchmark for video prediction.
Dataset Splits No The paper mentions datasets like BAIR and Kinetics-600, which are standard benchmarks, and refers to 'evaluation sets' or 'test-time' setups. However, it does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits by name) for reproduction.
Hardware Specification No The paper does not specify the exact hardware used for experiments, such as specific GPU or CPU models, or detailed cloud computing resources.
Software Dependencies No The paper refers to frameworks like Jax CFD and Simple Diffusion architecture but does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes Appendix C. Hyperparameters. Throughout the experiments we use U-Vi Ts which are essentially U-Nets with MLP Blocks instead of convolutional layers when self-attention is used in a block. ... Parameter Value Blocks [3 + 3, 3 + 3, 3 + 3, 8] Channels [128, 256, 512, 1024] Head Dim 128 Dropout [0, 0.1, 0.1, 0.1] ... learning rate 1e-4