Amplifying Membership Exposure via Data Poisoning

Authors: Yufei Chen, Chao Shen, Yun Shen, Cong Wang, Yang Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our attacks on computer vision benchmarks. Our results show that the proposed attacks can substantially increase the membership inference precision with minimum overall test-time model performance degradation.
Researcher Affiliation Collaboration Yufei Chen1,2 Chao Shen1 Yun Shen3 Cong Wang2 Yang Zhang4 1Xi an Jiaotong University 2City University of Hong Kong 3Net App 4CISPA Helmholtz Center for Information Security
Pseudocode No The paper describes its methods in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Code is available at https://github.com/yfchen1994/poisoning_membership.
Open Datasets Yes We adopt five datasets for our experiments, including (1) MNIST [1] that contains 60,000 handwritten digits from 0 to 9. (2) CIFAR-10 [2] that contains 60,000 images from 10 classes. (3) STL-10 [3] that contains 13,000 labeled images from 10 classes. (4) Celeb A [25] that contains 202,599 face images annotated by 40 attributes. (5) Patch Camelyon [46] that contains 327,680 images to predict the presence of metastatic tissue.
Dataset Splits Yes In our experiment, we split each dataset into three portions: the clean training dataset Dclean containing the members, the testing dataset Dtest containing the non-members, and the shadow dataset Dshadow for generating poisoning samples. We follow the same setup with [38], where |Dclean| = |Dtest| = |Dshadow|, and each of them does not overlap with others. Additionally, we set the three datasets to be balanced for the simplicity of evaluation among each class.
Hardware Specification Yes Our experiments were conducted on a deep learning server, which is equipped with an Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz, 128GB RAM, and four NVIDIA Ge Force RTX 3090 GPUs with 24GB of memory.
Software Dependencies Yes We use five pretrained models provided by Tensorflow(v2.5.2): Xception, Res Net18, Mobile Netv2, Inception V3, and VGG16.
Experiment Setup Yes We fix the feature extractor and train the newly added FC layers with the Adam optimizer, with the learning rate of 10 3 and batch size of 100. ... The hyperparameters used in our implementation are summarized in Table 4. LEARNING RATE 0.001 for Inception V3; 0.01 for others NOISE MULTIPLIER 1.0 MAX L2-NORM OF GRADIENTS 1.0 BATCH SIZE 100 MICRO BATCH SIZE 100 EPOCHS 20