Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Authors: Jincheng Zhong, Xingzhuo Guo, Jiaxiang Dong, Mingsheng Long

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to evaluate Diff-Tuning, including the transfer of pre-trained Diffusion Transformer models to eight downstream generations and the adaptation of Stable Diffusion to five control conditions with Control Net.
Researcher Affiliation Academia Jincheng Zhong , Xingzhuo Guo , Jiaxiang Dong, Mingsheng Long School of Software, BNRist, Tsinghua University, China {zjc22,gxz23,djx20}@mails.tsinghua.edu.cn, mingsheng@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Pseudo-code of Diff-Tuning
Open Source Code Yes Code is available at this repository: https://github.com/thuml/Diffusion-Tuning.
Open Datasets Yes Class-conditioned generation is a fundamental application of diffusion models. To fully evaluate transfer efficiency, we adhere to the benchmarks with a resolution of 256 × 256 as used in Diff Fit [54], including datasets such as Food101 [4], SUN397 [53], DF20-Mini [40], Caltech101 [13], CUB-200-2011 [50], Art Bench-10 [30], Oxford Flowers [37], and Stanford Cars [26].
Dataset Splits Yes SUN397 [53] ...We evaluate the methods on a random partition of the whole dataset with 76,128 training images, 10,875 validation images and 21,750 test images.
Hardware Specification Yes For each result, we fine-tune 24K iterations with a batch size of 32 for standard fine-tuning and Diff-Tuning, and a batch size of 64 for Diff Fit, on one NVIDIA A100 40G GPU.
Software Dependencies No All experiments are implemented by Pytorch and conducted on NVIDIA A100 40G GPUs. However, specific version numbers for PyTorch or other software dependencies are not provided.
Experiment Setup Yes Table 3: Hyperparameters of experiments. Class-conditional: Backbone DiT, Image Size 256, Batch Size 32, Learning Rate 1e-4, Optimizer Adam, Training Steps 24,000, Validation Interval 24,000, Sampling Steps 50, Augmented Dataset Size 200,000. Controlled: Backbone Stable-diffusion v1.5, Image Size 512, Batch Size 4, Learning Rate 1e-5, Optimizer Adam, Training Steps 15,000, Validation Interval 100, Sampling Steps 50, Augmented Dataset Size 30,000.