reproducibilityindex.ai

TabLeak: Tabular Data Leakage in Federated Learning

Authors: Mark Vero, Mislav Balunovic, Dimitar Iliev Dimitrov, Martin Vechev

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Tab Leak on four tabular datasets for both Fed SGD and Fed Avg training protocols, and show that it successfully breaks several settings previously deemed safe.
Researcher Affiliation	Academia	1Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland 2Department of Computer Science, ETH Zurich, Zurich, Switzerland.
Pseudocode	Yes	Algorithm 1 Tab Leak against training by Fed SGD
Open Source Code	Yes	Our code is available at: https://github.com/eth-sri/tableak.
Open Datasets	Yes	All experiments were carried out on four popular mixed-type tabular binary classification datasets, the Adult census dataset (Dua & Graff, 2017), the German Credit dataset (Dua & Graff, 2017), the Lawschool Admission dataset (Wightman, 2017), and the Health Heritage dataset from Kaggle (2012).
Dataset Splits	No	The paper discusses training models and observing their performance on a 'secluded test set' and mentions a 'separate validation set' in a general discussion on preventing overfitting, but it does not specify explicit training/validation/test splits (e.g., percentages or counts) for its own experimental setup to reproduce the data partitioning.
Hardware Specification	Yes	We ran each of our experiments on single cores of Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' but does not provide specific version numbers for any software or libraries used in the experiments (e.g., 'PyTorch 1.9').
Experiment Setup	Yes	For all attacks, we use the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.06 for 1 500 iterations and without a learning rate schedule to perform the optimization in Alg. 1.