Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FedFree: Breaking Knowledge-sharing Barriers through Layer-wise Alignment in Heterogeneous Federated Learning

Authors: Haizhou Du, Yiran Xiang, Yiwen Cai, Xiufeng Liu, Zonghan Wu, Huan Huo, Guodong Long

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide rigorous theoretical convergence guarantees for Fed Free and conduct extensive experiments on CIFAR-10 and CIFAR-100. Results demonstrate that Fed Free achieves substantial performance gains, with relative accuracy improving up to 46.3% over state-of-the-art baselines. ... To validate the contribution of Fed Free s core components, we conduct ablation studies on CIFAR100 with 100 clients in the Ht FL setting (non-IID1), shown in Figure 4.
Researcher Affiliation	Academia	Shanghai University of Electric Power, Shanghai, China Technical University of Denmark, Kongens Lyngby, Denmark East China Normal University, Shanghai, China University of Technology Sydney, Sydney, Australia Email: EMAIL
Pseudocode	Yes	Algorithm 1 Fed Free 1: Server initializes global model parameters G0. 2: for each communication round t = 1, 2, . . . , T do 3: Select a subset of clients St (or all clients by default).
Open Source Code	Yes	Our evaluations are based on open-accessed datasets that are proxyly available. An official implementation code is provided in supplemental material.
Open Datasets	Yes	Datasets. We use two standard benchmark datasets for image classification: CIFAR-10 [31] and CIFAR-100 [31].
Dataset Splits	No	To simulate real-world non-IID data distributions across clients, we partition the datasets using a method similar to [32]. Specifically, for non-IID1, in CIFAR-10, we distribute data from 1 out of 10 categories to each client (non-IID: 1/10). In CIFAR-100, we divide the data from 10 out of 100 categories for each client (non-IID: 10/100). ... Accuracy reported is the average test accuracy across all participating clients on their local test sets after the final communication round.
Hardware Specification	Yes	All experiments were implemented using Python 3.11 and Py Torch 1.11, executed on a cluster with Intel Xeon Gold 6126 CPUs and NVIDIA GTX 2080Ti / Tesla T4 GPUs.
Software Dependencies	Yes	All experiments were implemented using Python 3.11 and Py Torch 1.11, executed on a cluster with Intel Xeon Gold 6126 CPUs and NVIDIA GTX 2080Ti / Tesla T4 GPUs.
Experiment Setup	Yes	For local training, we use the SGD optimizer with a learning rate of η = 0.01 and a batch size of 10 for E = 5 local epochs per round. For Fed Free, we set the critical layer selection parameter k = 2 (based on the sensitivity analysis in Appendix D.3, Figure A1) and the server-side learning rate α = 0.01 unless otherwise specified. Total communication rounds T = 200. Further details, including specific hyperparameters for baselines (tuned for fair comparison where possible), are provided in Appendix D.2.