LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning

Authors: Timothy Castiglia, Yi Zhou, Shiqiang Wang, Swanand Kadhe, Nathalie Baracaldo, Stacy Patterson

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.
Researcher Affiliation Collaboration 1Rensselaer Polytechnic Institute 2IBM Research.
Pseudocode Yes Algorithm 1 LESS-VFL implemented using P-SGD
Open Source Code No The paper does not contain an explicit statement about the release of its source code, nor does it provide a link to a code repository.
Open Datasets Yes MIMIC-III (Johnson et al., 2016; Harutyunyan et al., 2019): Hospital dataset... Activity (Anguita et al., 2013): Time-series positional data... Phishing (Dua & Graff, 2017): Dataset... Gina (Guyon, 2007): Hand-written two-digit images. Sylva (Guyon, 2007): Forest cover type information.
Dataset Splits No The paper mentions splitting features among parties but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instance types.
Software Dependencies No The paper mentions the ADAM optimizer and P-SGD but does not provide specific version numbers for any software libraries, frameworks, or operating systems used in the experimental setup.
Experiment Setup Yes We run a grid search to determine regularization parameters for LESS-VFL, local lasso, and group lasso, and the number of pre-training epochs for LESS-VFL and local lasso. We use the ADAM optimizer with a learning rate of 0.01 when employing Algorithm 2 in VFL (Original and Spurious) and pre-training and post feature selection in local lasso and LESS-VFL. We run 150 epochs of P-SGD for embedding component selection in LESS-VFL and feature selection in LESS-VFL and local lasso, which we found to be a sufficient amount of iterations for the training loss to plateau.