reproducibilityindex.ai

CAFE: Catastrophic Data Leakage in Vertical Federated Learning

Authors: Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, Tianyi Chen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comparing to existing data leakage attacks, our extensive experimental results on vertical FL settings demonstrate the effectiveness of CAFE to perform large-batch data leakage attack with improved data recovery quality. We conduct extensive experiments on MNIST [18], CIFAR-10 [17] and Linnaeus 5 [4] datasets in VFL settings.
Researcher Affiliation	Collaboration	Xiao Jin Rensselaer Polytechnic Institute jinx2@rpi.edu Pin-Yu Chen IBM Research pin-yu.chen@ibm.com Chia-Yi Hsu National Yang Ming Chiao Tung University chiayihsu8315@gmail.com Chia-Mu Yu National Yang Ming Chiao Tung University chiamuyu@gmail.com Tianyi Chen Rensselaer Polytechnic Institute chent18@rpi.edu
Pseudocode	Yes	Algorithm 1 Recover the gradients r UL( , D) ( regular VFL and attacker ), Algorithm 2 Recover the inputs to the ﬁrst FC layer H ( regular VFL and attacker ), Algorithm 3 CAFE (Nested-loops), Algorithm 4 CAFE (Single-loop)
Open Source Code	Yes	The code of our work is available at https://github.com/De Rafael/CAFE.
Open Datasets	Yes	We conduct experiments on MNIST [18], CIFAR-10 [17] and Linnaeus 5 [4] datasets in VFL settings.
Dataset Splits	No	No specific training/validation/test dataset splits (percentages or counts) are explicitly provided, nor are citations to predefined standard splits for all datasets.
Hardware Specification	Yes	Scaling up to our hardware limits (RTX 2080 and TITAN V), CAFE can leak as many as 800 images in the VFL setting including 4 workers with a batch size as large as 100.
Software Dependencies	No	The paper mentions optimizers like SGD and Adam, but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The hyper-parameter settings are shown in Appendix G.1. We use the SGD optimizer with learning rate set as 0.1, σ2 = 1.1, and = 1000 for fake gradients. CAFE is able to recover training images when the learning rate (lr) is relatively small. Increasing the learning rate renders data leakage more difﬁcult because the model is making more sizeable parameter changes in each iteration, which can be regarded as an effective defense strategy. Adam with learning rate 10 6, trained on 800 images, tested on 100 images, batch size K = 40