Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
Authors: Timothy Castiglia, Yi Zhou, Shiqiang Wang, Swanand Kadhe, Nathalie Baracaldo, Stacy Patterson
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches. |
| Researcher Affiliation | Collaboration | 1Rensselaer Polytechnic Institute 2IBM Research. |
| Pseudocode | Yes | Algorithm 1 LESS-VFL implemented using P-SGD |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | MIMIC-III (Johnson et al., 2016; Harutyunyan et al., 2019): Hospital dataset... Activity (Anguita et al., 2013): Time-series positional data... Phishing (Dua & Graff, 2017): Dataset... Gina (Guyon, 2007): Hand-written two-digit images. Sylva (Guyon, 2007): Forest cover type information. |
| Dataset Splits | No | The paper mentions splitting features among parties but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions the ADAM optimizer and P-SGD but does not provide specific version numbers for any software libraries, frameworks, or operating systems used in the experimental setup. |
| Experiment Setup | Yes | We run a grid search to determine regularization parameters for LESS-VFL, local lasso, and group lasso, and the number of pre-training epochs for LESS-VFL and local lasso. We use the ADAM optimizer with a learning rate of 0.01 when employing Algorithm 2 in VFL (Original and Spurious) and pre-training and post feature selection in local lasso and LESS-VFL. We run 150 epochs of P-SGD for embedding component selection in LESS-VFL and feature selection in LESS-VFL and local lasso, which we found to be a sufficient amount of iterations for the training loss to plateau. |