Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

Authors: Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy Sonne Alstrøm, Sebastian U Stich

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin. [...] We empirically verify our theoretical statements on strongly convex and DNN-based non-convex functions. [...] We empirically demonstrate that using Fedssyn on top of several popular FL algorithms reduces communication cost and improves the Top-1 accuracy. Results hold across multiple datasets, different levels of data heterogeneity, number of clients, and participation rates.
Researcher Affiliation	Academia	Bo Li EMAIL Technical University of Denmark, Yasin Esfandiari EMAIL CISPA Helmholtz Center for Information Security, Mikkel N. Schmidt EMAIL Technical University of Denmark, Tommy S. Alstrøm EMAIL Technical University of Denmark, Sebastian U. Stich EMAIL CISPA Helmholtz Center for Information Security
Pseudocode	Yes	Algorithm I Fedssyn and Algorithm II Federated Averaging (Fed Avg) are explicitly provided in the paper, detailing the procedural steps of the proposed framework and the federated learning algorithm.
Open Source Code	Yes	We provide downloadable source code as part of the supplementary material. This code allows to reproduce our experimental evaluations show in the main part of the manuscript. The code for the additional verification on the d Sprites dataset in Appendix A.4 is not part of the submitted file. We will make this code and the genenerated datasets (such as in Figure A.10) available on a public github repository.
Open Datasets	Yes	We show the effectiveness of our proposed method on CIFAR10 and CIFAR100 (Krizhevsky, 2009) image classification tasks. [...] Experiments using MNIST and d Sprites are in Appendix A.3.1 and A.4. [...] The d Sprites dataset Matthey et al. (2017) contains three different types of shapes with different colours, scales, rotations, and locations.
Dataset Splits	Yes	We partition the training dataset using Dirichlet distribution with a concentration parameter α to simulate the heterogeneous scenarios following Lin et al. (2020). [...] We pick α {0.01, 0.1} as they are commonly used (Yu et al., 2022; Lin et al., 2020). [...] Each client trains DDPM by using 75% of their local data as the training data. [...] We first randomly select 10% of the images from the d Sprites dataset to formulate the test set. [...] For the rest of the dataset, we split them among 12 clients (N = 12) based on the four spatial locations (top-left, bottom-left, top-right, or bottom-right) and three shapes (square, ellipse, or heart) such that each client only sees a single type of shape from one of the four pre-defined locations. The number of images on each client is the same (ni = 55296).
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions computation requirements for clients in a general sense.
Software Dependencies	No	Following Dockhorn et al. (2022), we train the DP-DDPM for 500 epochs under the private setting (εDP = 10, δ = 1e 5) with the noise multiplier as 1.906 and clipping threshold 1.0 based on Opacus (Yousefpour et al., 2021) on each client. While Opacus is mentioned, a specific version number is not provided, and no other key software components with version numbers are listed.
Experiment Setup	Yes	We use class-conditional DDPM Ho et al. (2020) on each client. We assume that all the clients participate in training DDPM by using 75% of their local data as the training data. Each client trains DDPM with a learning rate of 0.0001, 1000 diffusion time steps, 256 batch size, and 500 epochs. These hyperparameters are the same for all experiments. [...] We use VGG-11 for all the experiments. [...] We use N {10, 40, 100} as the number of clients and C {0.1, 0.2, 0.4, 1.0} as the participation rate. For partial participation, we randomly sample N C clients per communication round. We use a batch size of 256, 10 local epochs, and the number of gradient steps 10ni / 256. We tune the learning rate from {0.01, 0.05, 0.1} with the local validation dataset.