Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unlocking the Potential of Model Calibration in Federated Learning

Authors: Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher Brinton

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that NUCFL offers flexibility and effectiveness across various FL algorithms, enhancing accuracy as well as model calibration.
Researcher Affiliation	Academia	1Purdue University, 2Yonsei University, 3University at Buffalo-SUNY EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 General FL Framework Algorithm 2 Applying NUCFL to FL
Open Source Code	No	The paper does not explicitly provide a link to open-source code for the methodology described, nor does it state that the code is available in supplementary materials or upon publication.
Open Datasets	Yes	We conduct experiments using four image classification datasets commonly utilized in FL research (Caldas et al., 2018; Mc Mahan et al., 2017; Mohri et al., 2019): MNIST (Le Cun et al., 1998), FEMNIST (Cohen et al., 2017), CIFAR-10 (Krizhevsky, 2009), and CIFAR-100 (Krizhevsky, 2009).
Dataset Splits	Yes	In the IID setup, data samples from each class are distributed equally to M = 50 clients. To simulate non-IID conditions across clients, we follow (Hsu et al., 2019; Nguyen et al., 2023; Chen et al., 2023) to partition the training set into M = 50 clients using a Dirichlet distribution with α = 0.5.
Hardware Specification	Yes	We run all experiments on a 3-GPU cluster of Tesla V100 GPUs, with each GPU having 32GB of memory.
Software Dependencies	No	The paper mentions using the SGD optimizer but does not specify versions for any key software components like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We run each FL algorithm for 100 rounds, evaluating the final global model, with 5 epochs for each local training. We use the SGD optimizer with a learning rate of 10-3, weight decay of 10-4, and momentum of 0.9. For additional details on the training specifics of each algorithm, please see Appendix A.2.