reproducibilityindex.ai

Contamination Attacks and Mitigation in Multi-Party Machine Learning

Authors: Jamie Hayes, Olga Ohrimenko

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper makes the following contributions: We identify contamination attacks that are stealthy and cause a model to learn an artiﬁcial connection between an attribute and label. Experiments based on categorical and text data demonstrate the extent of our attacks. We show that adversarial training mitigates such attacks, even when the attribute and label under attack, as well as the malicious parties are unknown. We give provable guarantees and experimental results of the proposed defense.
Researcher Affiliation	Collaboration	Jamie Hayes Univeristy College London ② s s Olga Ohrimenko Microsoft Research r r s t
Pseudocode	Yes	Table 1: Left: Attacker s procedure for contaminating b records from its dataset Dtrain. Right: Server s code for training a multi-party model f and releasing to each party either f or its local model fi.
Open Source Code	No	The paper does not provide any explicit statement or link for the open-source code of its methodology.
Open Datasets	Yes	We evaluated the attack on three datasets: UCI Adult (ADULT), UCI Credit Card (CREDIT CARD), and News20 (NEWS20), available from tt s r s t s ts.
Dataset Splits	Yes	The CREDIT CARD dataset... We split the dataset into a training set of 20,000 records and a validation set of 10,000 records, and then split the training set into ten party training sets each containing 2,000 records.
Hardware Specification	No	The paper describes the model architecture (fully-connected neural network, CNN) but does not provide specific hardware details like GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions activation functions (ReLU, log-softmax) and optimization methods (stochastic gradient descent) but does not list specific software libraries or their version numbers.
Experiment Setup	Yes	The model is optimized using stochastic gradient descent with a learning rate of 0.01 and momentum of 0.5. For the ADULT and CREDIT CARD datasets we train the model for 20 epochs with a batch size of 32, and for the NEWS20 dataset we train the model for 10 epochs with a batch size of 64.