Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum

Authors: Riccardo Zaccone, Sai Praneeth Karimireddy, Carlo Masone, Marco Ciccone

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on vision and language tasks confirm our theoretical findings, demonstrating that GHBM substantially improves state-of-the-art performance under random uniform client sampling, particularly in large-scale settings with high data heterogeneity and low client participation1. 5 Experimental Results: We present evidence both in controlled and real-world scenarios, showing that: (i) the GHBM formulation is pivotal to enable momentum to provide an effective correction even in extreme heterogeneity, (ii) our adaptive Local GHBM effectively exploits client participation to enhance communication efficiency and (iii) GHBM is suitable for cross-device scenarios, with stark improvement on large datasets and architectures.
Researcher Affiliation	Collaboration	Riccardo Zaccone Politecnico di Torino EMAIL Sai Praneeth Karimireddy USC Viterbi School of Engineering EMAIL Carlo Masone Politecnico di Torino EMAIL Marco Ciccone Vector Institute EMAIL
Pseudocode	Yes	Algorithm 1: GHBM, Local GHBM and Fed Avg ... Algorithm 2: GHBM (practical version) ... Algorithm 3: GHBM (theory version)
Open Source Code	Yes	1Code is available at https://github.com/RickZack/GHBM
Open Datasets	Yes	Extensive experiments on vision and language tasks confirm our theoretical findings... on vision and language tasks confirm our theoretical findings, demonstrating that GHBM substantially improves state-of-the-art performance... We present evidence both in controlled and real-world scenarios... For the controlled scenarios, we employ Cifar-10/100 as computer vision tasks, with Res Net-20 and the same CNN similar to a Le Net-5 commonly used in FL works (Hsu et al., 2020), and Shakespeare dataset as NLP task following (Reddi et al., 2021; Karimireddy et al., 2021). ... For simulating real-world scenarios, we adopt the large-scale GLDv2 and INaturalist datasets as CV tasks... and Stack Overflow dataset as NLP task, following Reddi et al. (2021); Karimireddy et al. (2021).
Dataset Splits	Yes	For Cifar-10/100 we follow the common practice of Hsu et al. (2020), sampling local datasets according to a Dirichlet distribution with concentration parameter α, denoting as non-iid and iid respectively the splits corresponding to α = 0 and α = 10.000 (additional details in Appendix C.2). For Shakespeare we use instead the predefined splits (Caldas et al., 2019). The datasets are partitioned among K = 100 clients, selecting a portion C = 10% of them at each round. ... Table 6: Details about datasets split used for our experiments (Cifar-10, Cifar-100, Shakespeare, Stack Overflow, GLDv2, INaturalist, Clients, Number of clients per round, Number of classes, Avg. examples per client, Number of local steps, Average participation (round no.))
Hardware Specification	Yes	The federated learning setup is simulated by using a single node equipped with 11 Intel(R) Core(TM) i7-6850K CPUs and 4 NVIDIA Ge Force GTX 1070 GPUs. For the large-scale experiments we used the computing capabilities offered by LEONARDO cluster of CINECA-HPC, employing nodes equipped with 1 CPU Intel(R) Xeon 8358 32 core, 2,6 GHz CPUs and 4 NVIDIA A100 SXM6 64GB (VRAM) GPUs.
Software Dependencies	Yes	We implemented all the tested algorithms and training procedures in a single codebase, using Py Torch 1.10 framework, compiled with cuda 10.2.
Experiment Setup	Yes	The training round budget T is set to be big enough for all algorithms to reach convergence in the worst-case scenario (α = 0), constrained by a time budget for the simulations. Being our proposed algorithm always faster, this ensures fair comparison with competitors. Results are always reported as the average over 5 independent runs, performed on the best-performing hyperparameters extensively searched separately for all competitor algorithms. All the experiments are conducted under random uniform client sampling, as it is standard practice. ... Table 7: Hyper-parameter search grid for each combination of method and dataset (for α = 0). The best values are indicated in bold.