Privacy for Free: How does Dataset Condensation Help Privacy?

Authors: Tian Dong, Bo Zhao, Lingjuan Lyu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also empirically validate the visual privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. Through empirical evaluations on image datasets, we validate that DC-synthesized data can preserve both data efficiency and membership privacy when being used for model training.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2School of Informatics, The University of Edinburgh 3Sony AI.
Pseudocode No The paper contains mathematical equations and descriptions of methods but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not provide any explicit statements about making the source code available or links to a code repository for the described methodology.
Open Datasets Yes We use three datasets: Fashion MNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2015) for gender classification.
Dataset Splits No The paper describes how datasets are split for member and non-member data for MIA evaluation (e.g., 'randomly split it into two subsets of equal amount of samples'), but it does not explicitly define or specify a 'validation' dataset split for model training purposes, separate from the training and testing sets.
Hardware Specification No The paper states 'All experiments are conducted with Pytorch 1.10 on a Ubuntu 20.04 server.' but does not provide specific hardware details such as GPU/CPU models, memory, or processor types.
Software Dependencies Yes All experiments are conducted with Pytorch 1.10 on a Ubuntu 20.04 server.
Experiment Setup Yes One important hyperparameter of DSA, DM and KIP is the ratio of image per class ripc = |S| / |T|. We evaluate ripc = 0.002, 0.01 for all methods, and for DM we add an extra evaluation ripc = 0.02... We reproduce DM (Zhao & Bilen, 2021a) and adopt large learning rates to accelerate the condensation (i.e., 10, 50, 100 as learning rate for ripc = 0.002, 0.01, 0.02, respectively). ...set learning rate 0.04 and 0.1 for ripc = 0.002 and 0.01, respectively.