Contamination Attacks and Mitigation in Multi-Party Machine Learning

Authors: Jamie Hayes, Olga Ohrimenko

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper makes the following contributions: We identify contamination attacks that are stealthy and cause a model to learn an artificial connection between an attribute and label. Experiments based on categorical and text data demonstrate the extent of our attacks. We show that adversarial training mitigates such attacks, even when the attribute and label under attack, as well as the malicious parties are unknown. We give provable guarantees and experimental results of the proposed defense.
Researcher Affiliation Collaboration Jamie Hayes Univeristy College London ② s s Olga Ohrimenko Microsoft Research r r s t
Pseudocode Yes Table 1: Left: Attacker s procedure for contaminating b records from its dataset Dtrain. Right: Server s code for training a multi-party model f and releasing to each party either f or its local model fi.
Open Source Code No The paper does not provide any explicit statement or link for the open-source code of its methodology.
Open Datasets Yes We evaluated the attack on three datasets: UCI Adult (ADULT), UCI Credit Card (CREDIT CARD), and News20 (NEWS20), available from tt s r s t s ts.
Dataset Splits Yes The CREDIT CARD dataset... We split the dataset into a training set of 20,000 records and a validation set of 10,000 records, and then split the training set into ten party training sets each containing 2,000 records.
Hardware Specification No The paper describes the model architecture (fully-connected neural network, CNN) but does not provide specific hardware details like GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions activation functions (ReLU, log-softmax) and optimization methods (stochastic gradient descent) but does not list specific software libraries or their version numbers.
Experiment Setup Yes The model is optimized using stochastic gradient descent with a learning rate of 0.01 and momentum of 0.5. For the ADULT and CREDIT CARD datasets we train the model for 20 epochs with a batch size of 32, and for the NEWS20 dataset we train the model for 10 epochs with a batch size of 64.