Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy
Authors: Shengfang ZHAI, Huanran Chen, Yinpeng Dong, Jiajun Li, Qingni Shen, Yansong Gao, Hang Su, Yang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our method significantly outperforms previous methods across various data distributions and dataset scales. Additionally, our method shows superior resistance to overfitting mitigation strategies, such as early stopping and data augmentation. We conduct extensive experiments on three text-to-image datasets [32, 35, 66] with various data distributions and dataset scales, using the mainstream open-sourced text-to-image diffusion models [11, 47] under both fine-tuning and pretraining settings. |
| Researcher Affiliation | Collaboration | Shengfang Zhai1,2, Huanran Chen3,6, Yinpeng Dong3,6 , Jiajun Li1,2, Qingni Shen1,2 , Yansong Gao4, Hang Su3,5, Yang Liu7 1School of Software and Microelectronics, Peking University 2PKU-OCTA Laboratory for Blockchain and Privacy Computing, Peking University 3Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua University 4The University of Western Australia 5Zhongguancun Laboratory, Beijing, China 6Real AI 7Nanyang Technological University |
| Pseudocode | No | The paper describes methods and processes in text and mathematical formulas, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide all the implementation details and the hyper-parameters. We also provide the code. All the models and utilized datasets are open-soured in this paper. And we provide the code in supplemental material. |
| Open Datasets | Yes | We conduct extensive experiments on three text-to-image datasets [32, 35, 66] with various data distributions and dataset scales, using the mainstream open-sourced text-to-image diffusion models [11, 47] under both fine-tuning and pretraining settings. For the fine-tuning setting, we select 416/417 samples on Pokémon [32], 2500/2500 samples on MS-COCO [35] and 10, 000/10, 000 samples on Flickr [66] as the member/hold-out dataset, respectively. For the pretraining setting, we conduct experiments on Stable Diffusion v1-54 [47] using the processed LAION dataset [51]. |
| Dataset Splits | No | The paper mentions 'member/hold-out dataset' splits, but does not explicitly detail a separate 'validation' split with percentages or counts for hyperparameter tuning. It mentions using a 'shadow model to obtain the α for Eq. (16), classifiers for Eq. (18) and the threshold τ for calculating ASR with auxiliary datasets of the same distribution', which implies a form of validation, but not a distinct data split labelled as such. |
| Hardware Specification | Yes | Our experiments are divided into two main parts: training (fine-tuning) and inference, both conducted on a single RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions software like Huggingface [24], XGBoost [9], and implicitly uses PyTorch/Python, but does not specify exact version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | For fine-tuning, previous membership inference on text-to-image diffusion models usually relies on strong overfitting settings. To evaluate the performance more realistically, we consider the two following setups: (1) Over-training. Following the previous works [15, 17, 28], we fine-tune 15,000 steps on Pokemon datasets, and 150,000 steps on MS-COCO and Flickr (with only 2500/2500 dataset size). (2) Real-world training... Thus, we train 7,500 steps, 50,000 steps and 200,000 steps for the Pokémon, MS-COCO and Flickr datasets, respectively. Additionally, we employ the default data augmentation (Random-Crop and Random-Flip [25]) in training codes [25] to simulate real-world scenarios. We set M, N = 3 to ensure the balance between a low query count and satisfied performance... we use the time list of [440, 450, 460], resulting in the query count of 15. |