$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States
Authors: Sam Bond-Taylor, Chris G. Willcocks
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on high-resolution datasets, we found that even at an 8 subsampling rate, our model retains high-quality diffusion. |
| Researcher Affiliation | Academia | Sam Bond-Taylor, Chris G. Willcocks Department of Computer Science Durham University {samuel.e.bond-taylor, christopher.g.willcocks}@durham.ac.uk |
| Pseudocode | No | The paper describes methods in prose and with diagrams but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://github.com/samb-t/infty-diff. |
| Open Datasets | Yes | We train models on 256 256 datasets, FFHQ (Karras et al., 2019) and LSUN Church (Yu et al., 2015), as well as Celeb A-HQ (Karras et al., 2018) |
| Dataset Splits | No | Optimisation is performed using the Adam optimiser (Kingma and Ba, 2015) with a batch size of 32 and learning rate of 5 10 5; each model being trained to optimise validation loss. |
| Hardware Specification | Yes | All 256 256 models are trained on a single NVIDIA A100 80GB GPU using automatic mixed precision. |
| Software Dependencies | No | Optimisation is performed using the Adam optimiser (Kingma and Ba, 2015) |
| Experiment Setup | Yes | Optimisation is performed using the Adam optimiser (Kingma and Ba, 2015) with a batch size of 32 and learning rate of 5 10 5; each model being trained to optimise validation loss. Each model is trained as a diffusion autoencoder to reduce training variance, allowing much smaller batch sizes thereby permitting training on a single GPU. A latent size of 1024 is used and the latent model architecture and diffusion hyperparameters are the same as used by Preechakul et al. (2022). In image space, the diffusion model uses a cosine noise schedule (Nichol and Dhariwal, 2021) with 1000 steps. Mollifying is performed with Gaussian blur with a variance of 1.0. |