Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Provably Personalized and Robust Federated Learning

Authors: Mariel Werner, Lie He, Michael Jordan, Martin Jaggi, Sai Praneeth Karimireddy

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify our theoretical assumptions and demonstrate experimentally that our learning algorithms benefit from collaboration, scale with the number of collaborators, are competitive with SOTA personalized federated learning algorithms, and are not sensitive to the model s initial weights (Section 4).
Researcher Affiliation	Academia	Mariel Werner Department of Electrical Engineering and Computer Sciences, University of California, Berkeley Lie He Machine Learning and Optimization Laboratory (MLO), EPFL, Switzerland Michael Jordan Department of Electrical Engineering and Computer Sciences, University of California, Berkeley Martin Jaggi Machine Learning and Optimization Laboratory (MLO), EPFL, Switzerland Sai Praneeth Karimireddy Department of Electrical Engineering and Computer Sciences, University of California, Berkeley
Pseudocode	Yes	Algorithm 1 Myopic-Clustering Algorithm 2 Federated-Clustering Algorithm 3 Threshold-Clustering Algorithm 4 Momentum-Clustering
Open Source Code	No	The paper states: "All algorithms are implemented with Py Torch (Paszke et al., 2017)." However, it does not explicitly state that the authors' own code for the methodology described in the paper is open-source, nor does it provide a link to a repository.
Open Datasets	Yes	We use the MNIST dataset (Le Cun et al., 2010) and CIFAR dataset (Krizhevsky, 2009) to compare our proposed algorithm, Federated-Clustering, with existing state-of-the-art federated learning algorithms.
Dataset Splits	Yes	The data samples are randomly shuffled and split into K = 4 clusters with ni = 75 clients in each cluster. For the CIFAR-10 experiment, we create 4 clusters, each containing 5 clients and transform the labels in each cluster such that different clusters can have different labels for the same image
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	All algorithms are implemented with Py Torch (Paszke et al., 2017). ...generated using the scikit-learn package (Pedregosa et al., 2011). The paper mentions PyTorch and scikit-learn, but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We tune the learning rate separately for each algorithm through grid search, but preserve all other algorithmic setups. For the CIFAR-10 experiment, we create 4 clusters, each containing 5 clients and transform the labels in each cluster such that different clusters can have different labels for the same image (the private label task in Section 4.2). We train a VGG-16 model (Simonyan & Zisserman, 2015) with batch size 32, learning rate 0.1, and momentum 0.9.