TabLeak: Tabular Data Leakage in Federated Learning

Authors: Mark Vero, Mislav Balunovic, Dimitar Iliev Dimitrov, Martin Vechev

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Tab Leak on four tabular datasets for both Fed SGD and Fed Avg training protocols, and show that it successfully breaks several settings previously deemed safe.
Researcher Affiliation Academia 1Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland 2Department of Computer Science, ETH Zurich, Zurich, Switzerland.
Pseudocode Yes Algorithm 1 Tab Leak against training by Fed SGD
Open Source Code Yes Our code is available at: https://github.com/eth-sri/tableak.
Open Datasets Yes All experiments were carried out on four popular mixed-type tabular binary classification datasets, the Adult census dataset (Dua & Graff, 2017), the German Credit dataset (Dua & Graff, 2017), the Lawschool Admission dataset (Wightman, 2017), and the Health Heritage dataset from Kaggle (2012).
Dataset Splits No The paper discusses training models and observing their performance on a 'secluded test set' and mentions a 'separate validation set' in a general discussion on preventing overfitting, but it does not specify explicit training/validation/test splits (e.g., percentages or counts) for its own experimental setup to reproduce the data partitioning.
Hardware Specification Yes We ran each of our experiments on single cores of Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not provide specific version numbers for any software or libraries used in the experiments (e.g., 'PyTorch 1.9').
Experiment Setup Yes For all attacks, we use the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.06 for 1 500 iterations and without a learning rate schedule to perform the optimization in Alg. 1.