Intriguing Properties of Data Attribution on Diffusion Models

Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and Celeb A, as well as a Stable Diffusion model Lo RA-finetuned on Art Bench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation.
Researcher Affiliation Collaboration 1Singapore Management University 2Sea AI Lab, Singapore {zhengxs, tianyupang, duchao, linmin}@sea.com; jingjiang@smu.edu.sg
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code is available at https://github.com/sail-sg/D-TRAK.
Open Datasets Yes Our experiments are conducted on three datasets including CIFAR (32 × 32), Celeb A (64 × 64), and Art Bench (256 × 256). More details of datasets can be found in Appendix A.1. CIFAR (32 × 32). The CIFAR-10 dataset (Krizhevsky et al., 2009) contains 50,000 training samples. Celeb A (64 × 64). We sample a subset of 5,000 training samples and 1,000 validation samples from the original training set and test set of Celeb A (Liu et al., 2015) Art Bench (256 × 256). Art Bench (Liao et al., 2022) is a dataset for artwork generation.
Dataset Splits Yes We randomly sample 1,000 validation samples from CIFAR-10’s test set for LDS evaluation. To reduce computation, we also construct a CIFAR-2 dataset as a subset of CIFAR-10, which consists of 5,000 training samples randomly sampled from CIFAR-10’s training samples corresponding to the automobile and horse classes, and 1,000 validation samples randomly sampled from CIFAR-10’s test set corresponding to the same two classes.
Hardware Specification Yes For all of our experiments, we use 64 CPU cores and NVIDIA A100 GPUs each with 40GB of memory.
Software Dependencies Yes In this paper, we train various diffusion models for different datasets using the Diffusers library.6 We compute the per-sample gradient following a tutorial of the PyTorch library (version 2.0.1).7 We use the trak library8 to project gradients with a random projection matrix, which is implemented using a faster custom CUDA kernel.9
Experiment Setup Yes The maximum timestep is T = 1000, and we choose the linear variance schedule for the forward diffusion process as β1 = 10−4 to βT = 0.02. We set the dropout rate to 0.1, employ the AdamW (Loshchilov & Hutter, 2019) optimizer with weight decay of 10−6, and augment the data with random horizontal flips. A DDPM is trained for 200 epochs with a 128 batch size, using a cosine annealing learning rate schedule with a 0.1 fraction warmup and an initial learning rate of 10−4.