Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FLUX: Efficient Descriptor-Driven Clustered Federated Learning under Arbitrary Distribution Shifts

Authors: Dario Fenoglio, Mohan Li, Pietro Barbiero, Nicholas D. Lane, Marc Langheinrich, Martin Gjoreski

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across four standard benchmarks, two real-world datasets and ten state-of-the-art baselines show that FLUX improves performance and stability under diverse distribution shifts achieving an average accuracy gain of up to 23 percentage points over the best-performing baselines while maintaining computational and communication overhead comparable to Fed Avg.
Researcher Affiliation	Collaboration	Dario Fenoglio Università della Svizzera italiana Lugano, Switzerland EMAIL Mohan Li Università della Svizzera italiana Lugano, Switzerland EMAIL Pietro Barbiero IBM Research Zurich, Switzerland EMAIL Nicholas D. Lane University of Cambridge Cambridge, United Kingdom EMAIL Marc Langheinrich Università della Svizzera italiana Lugano, Switzerland EMAIL Martin Gjoreski Università della Svizzera italiana Lugano, Switzerland EMAIL
Pseudocode	Yes	We provide the pseudo-code for both the training phase (Algorithm 1) and inference phase (Algorithm 2) of our proposed implementation of FLUX.
Open Source Code	Yes	To ensure reproducibility, our code, along with detailed instructions for reproducing the experiments, is publicly accessible on Git Hub under the MIT license. https://github.com/dariofenoglio98/FLUX
Open Datasets	Yes	We use six publicly available datasets in our experiments: MNIST [28], Fashion-MNIST (FMNIST) [29], CIFAR-10, CIFAR-100 [30]; and two real-world datasets, Che Xpert [31] and Office-Home [32].
Dataset Splits	Yes	We adopt a 5-fold cross-validation strategy to evaluate model performance, using fixed random seeds (42, 43, 44, 45, and 46) to ensure reproducibility. ... Each client reserves 20% of its local data for validation. ... To simulate non-IID conditions in FL, we employ ANDA, a publicly available toolkit that enables data operations such as class isolation and label swapping. See Appendix B.1 for details on the ANDA and the datasets partitioning strategy.
Hardware Specification	Yes	All experiments were conducted on a workstation equipped with four NVIDIA RTX A6000 GPUs (48 GB each), two AMD EPYC 7513 32-Core processors, and 512 GB of RAM.
Software Dependencies	Yes	Our experiments were implemented using Python 3.12 and open-source libraries including Py Torch 2.4 [71] (BSD license), Scikit-learn 1.5 [72] (BSD license), and Flower 1.11 [73] (Apache License). For visualization, we utilized Matplotlib 3.9 [74] (BSD license) and Seaborn 0.13 [75] (BSD license), while data processing was performed using Pandas 2.2 [76] (BSD license).
Experiment Setup	Yes	We adopt a 5-fold cross-validation strategy to evaluate model performance, using fixed random seeds (42, 43, 44, 45, and 46) to ensure reproducibility. For the MNIST, FMNIST, and CIFAR-10 datasets, we use Le Net-5 [69] as the base model; for CIFAR-100, Che Xpert, and Office-Home, we use Res Net-9 [70]. A batch size of 64 is used for both training and evaluation. Each client reserves 20% of its local data for validation. The FL process runs for 10 communication rounds on MNIST, FMNIST, and CIFAR-10, for 15 rounds on CIFAR-100, 20 rounds on Che Xpert, 40 rounds on Office-Home, with each client performing 2 local training epochs per round. The learning rate is set to 0.005 with a momentum of 0.9. All models are trained using cross-entropy loss, except on Che Xpert, where a binary cross-entropy loss is used due to the multi-label classification setting.