Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unlocking the Potential of Model Calibration in Federated Learning
Authors: Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher Brinton
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that NUCFL offers flexibility and effectiveness across various FL algorithms, enhancing accuracy as well as model calibration. |
| Researcher Affiliation | Academia | 1Purdue University, 2Yonsei University, 3University at Buffalo-SUNY EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 General FL Framework Algorithm 2 Applying NUCFL to FL |
| Open Source Code | No | The paper does not explicitly provide a link to open-source code for the methodology described, nor does it state that the code is available in supplementary materials or upon publication. |
| Open Datasets | Yes | We conduct experiments using four image classification datasets commonly utilized in FL research (Caldas et al., 2018; Mc Mahan et al., 2017; Mohri et al., 2019): MNIST (Le Cun et al., 1998), FEMNIST (Cohen et al., 2017), CIFAR-10 (Krizhevsky, 2009), and CIFAR-100 (Krizhevsky, 2009). |
| Dataset Splits | Yes | In the IID setup, data samples from each class are distributed equally to M = 50 clients. To simulate non-IID conditions across clients, we follow (Hsu et al., 2019; Nguyen et al., 2023; Chen et al., 2023) to partition the training set into M = 50 clients using a Dirichlet distribution with α = 0.5. |
| Hardware Specification | Yes | We run all experiments on a 3-GPU cluster of Tesla V100 GPUs, with each GPU having 32GB of memory. |
| Software Dependencies | No | The paper mentions using the SGD optimizer but does not specify versions for any key software components like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We run each FL algorithm for 100 rounds, evaluating the final global model, with 5 epochs for each local training. We use the SGD optimizer with a learning rate of 10-3, weight decay of 10-4, and momentum of 0.9. For additional details on the training specifics of each algorithm, please see Appendix A.2. |