On the Importance of Difficulty Calibration in Membership Inference Attacks

Authors: Lauren Watson, Chuan Guo, Graham Cormode, Alexandre Sablayrolles

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effect of difficulty calibration, we perform a comprehensive evaluation of several score-based attacks on standard benchmark datasets.
Researcher Affiliation Collaboration Lauren Watson University of Edinburgh Chuan Guo Graham Cormode Meta AI Alexandre Sablayrolles. Work done during an internship at Facebook. Email:lauren.watson@ed.ac.uk, {chuanguo, gcormode, asablayrolles}@fb.com
Pseudocode No The paper describes its methods and algorithms in paragraph text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes An implementation of these attacks is available at https://github.com/facebookresearch/ calibration_membership.
Open Datasets Yes We perform experiments on several benchmark classification datasets: German Credit, Hepatitis and Adult datasets from the UCI Machine Learning Repository (Dua & Graff, 2017), MNIST (Le Cun et al., 1998), CIFAR10/100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009).
Dataset Splits Yes We split the data into two sets: a private set, known only to the trainer, and a public set, which is used for training reference models and selecting the decision threshold τ. The trainer trains their model h on half of the private set, keeping the other half as non-members. ... To find a threshold for optimal accuracy, we first split the public set of examples in half again, and treat one half as members, with the rest as non-members.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions the use of the 'Opacus' library for differentially private training, but it does not specify a version number for this or any other software dependency.
Experiment Setup Yes The target models are trained for between 50 and 200 epochs, with batch sizes varying from 4 (for very small datasets) to 1024. For optimization, we use SGD with a learning rate of 0.1, Nesterov momentum of 0.9 and a cosine learning rate schedule for the CIFAR10/100 and Image Net datasets. Smaller datasets such as the German Credit dataset also used weight decay of 1×10−4.