reproducibilityindex.ai

On the Importance of Difficulty Calibration in Membership Inference Attacks

Authors: Lauren Watson, Chuan Guo, Graham Cormode, Alexandre Sablayrolles

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effect of difﬁculty calibration, we perform a comprehensive evaluation of several score-based attacks on standard benchmark datasets.
Researcher Affiliation	Collaboration	Lauren Watson University of Edinburgh Chuan Guo Graham Cormode Meta AI Alexandre Sablayrolles. Work done during an internship at Facebook. Email:lauren.watson@ed.ac.uk, {chuanguo, gcormode, asablayrolles}@fb.com
Pseudocode	No	The paper describes its methods and algorithms in paragraph text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	An implementation of these attacks is available at https://github.com/facebookresearch/ calibration_membership.
Open Datasets	Yes	We perform experiments on several benchmark classiﬁcation datasets: German Credit, Hepatitis and Adult datasets from the UCI Machine Learning Repository (Dua & Graff, 2017), MNIST (Le Cun et al., 1998), CIFAR10/100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009).
Dataset Splits	Yes	We split the data into two sets: a private set, known only to the trainer, and a public set, which is used for training reference models and selecting the decision threshold τ. The trainer trains their model h on half of the private set, keeping the other half as non-members. ... To ﬁnd a threshold for optimal accuracy, we ﬁrst split the public set of examples in half again, and treat one half as members, with the rest as non-members.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions the use of the 'Opacus' library for differentially private training, but it does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	The target models are trained for between 50 and 200 epochs, with batch sizes varying from 4 (for very small datasets) to 1024. For optimization, we use SGD with a learning rate of 0.1, Nesterov momentum of 0.9 and a cosine learning rate schedule for the CIFAR10/100 and Image Net datasets. Smaller datasets such as the German Credit dataset also used weight decay of 1×10−4.