reproducibilityindex.ai

Membership Inference Attacks on Diffusion Models via Quantile Regression

Authors: Shuai Tang, Steven Wu, Sergul Aydore, Michael Kearns, Aaron Roth

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our attack on diffusion models trained on image datasets, and demonstrate four major advantages: I. Our quantile-regression-based attack obtains state-of-the-art accuracy on several popular vision datasets. [...] Numerical results are presented in Table 1, and they are averaged across 10 random seeds.
Researcher Affiliation	Collaboration	1Amazon AWS AI/ML 2Carnegie Mellon University 3University of Pennsylvania.
Pseudocode	Yes	Algorithm 1 Quantile Regression MI attacks for Diffusion Model; Algorithm 2 Bag of Weak Attackers
Open Source Code	No	The paper states it adopted a publicly available GitHub repository (https://github.com/kuangliu/pytorch-cifar) for the base architecture of their attack models and used a released codebase (https://github.com/jinhaoduan/SecMI) by Duan et al. for training target diffusion models. However, it does not explicitly state that the source code for their specific implementation of the quantile regression MI attack is publicly available.
Open Datasets	Yes	We demonstrate the effectiveness of our membership inference attack via quantile regression on four denoising diffusion probabilistic models (Ho et al., 2020) (DDPMs) trained on CIFAR-10, CIFAR-100 (Krizhevsky, 2009), STL10 (Coates et al., 2011) and Tiny-Image Net, respectively.
Dataset Splits	No	The paper states: 'On each dataset, data samples are split into two halves, and one half is regarded as the private samples Z for training a DDPM. The other half is then split into two sets, including one as the public samples D that are auxiliary information, and the other as the holdout set for testing.' This describes the training data for the target model and auxiliary data for the attack, as well as a holdout test set, but it does not specify explicit validation dataset splits or percentages for either the target model or the attack models.
Hardware Specification	Yes	Each diffusion model was trained with 80k steps, and it took around 2 days to finish training on a single V100 GPU card.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2015)' as the optimizer and that attack models use a 'Res Net architecture' from a 'pytorch-cifar' GitHub repository, implying PyTorch. It also states target models were trained using a codebase by Duan et al. However, no specific version numbers for Python, PyTorch, or any other libraries/solvers are provided, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	All attack models in our experiments are trained with the same following optimization settings: 1. optimizer: Adam (Kingma & Ba, 2015) 2. batch size: 128 3. initial learning rate: 1e-3 4. number of training epochs: 200 5. learning rate scheduler: cosine annealing without warm restarts (Loshchilov & Hutter, 2017). For membership inference attacks, we use a fixed t = 50 in the t-error function.