Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics
Authors: Bingzhe Wu, Chaochao Chen, Shiwan Zhao, Cen Chen, Yuan Yao, Guangyu Sun, Li Wang, Xiaolu Zhang, Jun Zhou6372-6379
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on different datasets and models verify our theoretical findings and suggest that the SGLD algorithm can not only reduce the information leakage but also improve the generalization ability of the DNN models in real-world applications. ... To verify our theoretical findings, we perform membership attacks on different real-world datasets to evaluate the information leakage of models trained with different optimization methods (e.g. SGD and SGLD). ... In this section, we conduct empirical studies to verify our theoretical finding, i.e., training models using SGLD can alleviate the information leakage of the training dataset. |
| Researcher Affiliation | Collaboration | Bingzhe Wu,1 Chaochao Chen,2 Shiwan Zhao,3 Cen Chen,2 Yuan Yao,4 Guangyu Sun,1 Li Wang,2 Xiaolu Zhang,2 Jun Zhou2 1Peking University, 2Ant Financial Service Group, 3IBM Reseach, 4Hong Kong University of Science and Technology |
| Pseudocode | No | The paper presents equations for the SGLD updating rule but does not include formally labeled pseudocode blocks or algorithm sections. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include links to a code repository. |
| Open Datasets | Yes | Specifically, we select two datasets as our benchmarks, namely, German Credit dataset (Dua and Graff 2017) and IDC dataset5. ... IDC dataset is used for invasive ductal carcinoma (IDC) classification. This dataset contains 277,524 patches of 50 50 pixels... http://www.andrewjanowczyk.com/ use-case-6-invasive-ductal-carcinoma-idc-segmentation/ |
| Dataset Splits | Yes | We randomly split the whole dataset into training (400 applications), hold-out/validation (300 applications), and test (300 applications) sets. ... Following the setting of the work (Leino and Fredrikson 2019), we split the whole dataset into training, validation (hold-out), and test sets. To be specific, the training dataset consists of 10, 788 positive patches and 29, 164 negative patches. The test dataset consists of 11, 595 positive patches and 31, 825 negative patches. The remain patches are used as the hold-out dataset. |
| Hardware Specification | No | The paper does not explicitly describe the hardware specifications (e.g., specific GPU or CPU models, memory details) used to run the experiments. |
| Software Dependencies | No | The paper describes the models and training strategies but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All these training strategies share the following hyper-parameters: the mini-batch is set to 32 and the epoch number is set to 30. The learning rate decreases by half every 5 epochs. For SGLD, the variance of the prior is set to 1.0. The initial learning rate is set to 1 10 3. For the IDC dataset... The mini-batch is set to 128 and the epoch number is set to 100. Data augmentation is not used. The learning rate decreases by half every 20 epochs. For SGLD, the variance σ2 of the prior is set to 1.0. The initial learning rate is set to 1 10 4. |