DEFT: Efficient Fine-tuning of Diffusion Models by Learning the Generalised $h$-transform

Authors: Alexander Denker, Francisco Vargas, Shreyas Padhy, Kieran Didi, Simon Mathis, Riccardo Barbano, Vincent Dutordoir, Emile Mathieu, Urszula Julia Komorowska, Pietro Lió

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental DEFT is much faster than existing baselines while achieving state-of-the-art performance across a variety of linear and non-linear benchmarks. On image reconstruction tasks, we achieve speedups of up to 1.6 , while having the best perceptual quality on natural images and reconstruction performance on medical images. Further, we also provide initial experiments on protein motif scaffolding and outperform reconstruction guidance methods.
Researcher Affiliation Collaboration Alexander Denker University College London a.denker@ucl.ac.uk Francisco Vargas* University of Cambridge fav25@cam.ac.uk Shreyas Padhy* University of Cambridge sp2058@cam.ac.uk Kieran Didi* University of Cambridge ked48@cam.ac.uk Simon Mathis* University of Cambridge svm34@cam.ac.uk Vincent Dutordoir University of Cambridge vd309@cam.ac.uk Riccardo Barbano Atinary Technologies rbarbano@atinary.com Emile Mathieu University of Cambridge ebm32@cam.ac.uk Urszula Julia Komorowska University of Cambridge ujk21@cam.ac.uk Pietro Lio University of Cambridge pl219@cam.ac.uk
Pseudocode Yes Algorithm 1 | Unconditional training of denoising diffusion models [28] (Appendix I)
Open Source Code Yes We provide our code https://github.com/alexdenker/DEFT.
Open Datasets Yes We test a wide variety of both linear and non-linear image reconstruction tasks on the 256 256px Image Net dataset [58]. (Section 4.1) and We are evaluating DEFT both on the 2016 American Association of Physicists in Medicine (AAPM) grand challenge dataset [45], and the Lo Do Pab-CT dataset [35] (Section 4.2)
Dataset Splits Yes We perform all our evaluations on a 1k subset of the validation set3. For all inverse problems under consideration, the h-transform was trained on a separate 1k subset of the validation set. (Section 4.1) and The training set contains 35820 images and was used to train the unconditional diffusion model. The validation set contains 3522 images and was used to train the h-transform model and for a hyperparameter search for DPS and RED-diff. (Appendix E.1)
Hardware Specification Yes This time is calculated by fitting the largest batch size of validation images that fit on a single A100 GPU and dividing the time taken for the batch by the batch size. (Section 4) and Time (s) 208.9 83.4 16.3 156.8 70.1 13.8 Time per image on a single Ge Force RTX 3090. (Table 4)
Software Dependencies No The paper mentions software like 'Adam optimiser' and implicitly PyTorch (e.g., 'torch autograd') but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes In the image experiment, we use the DDPM [28] formulation for the diffusion model with N = 1000 steps, a linear β-schedule with β0 = 10 4 and βN = 2 10 2. (Appendix E.1) and The fine-tuning network was trained for 200 epochs using a batch size of 16, and the Adam optimiser with a learning rate of 5e 4 and annealing. (Appendix E.1)