Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks

Authors: Nurbek Tastan, Samuel Horváth, Karthik Nandakumar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically study the convergence of our proposed approach and empirically validate it using extensive experiments on different datasets and architectures. We also extend our approach to enable training-time model reward allocation. The code can be found at https://github.com/tnurbek/aequa.
Researcher Affiliation	Academia	1Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE 2Michigan State University (MSU), Michigan, USA. Correspondence to: Nurbek Tastan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Aequa: Federated optimization Algorithm 2 Aequa (with training-time model rewards)
Open Source Code	Yes	The code can be found at https://github.com/tnurbek/aequa.
Open Datasets	Yes	We use the following datasets to carry out our experiments (following (Li et al., 2020)): MNIST (Le Cun, 1998), Fashion-MNIST (FMNIST) (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 & CIFAR-100 (Krizhevsky et al., 2009), Stanford Sentiment Treebank (SST) (Socher et al., 2013), and the federated handwriting dataset FEMNIST (Caldas et al., 2019).
Dataset Splits	Yes	The other datasets are partitioned using the following strategies: (i) homogeneous, where each participant gets an equal number of data points per class; (ii) heterogeneous, where each client gets a varying number of data points per class based on a Dirichlet(α) distribution (concentration parameter α reflects the degree of non-i.i.d. characteristics within the dataset); (iii) quantity skew allocates κ proportion of total data points to each of the m selected participants and the remaining N m participants split the remaining data equally; (iv) label skew, denoted by #C = m, creates a label imbalance by sampling m classes for each client and then randomly distributing samples from class m among selected participants.
Hardware Specification	Yes	All experiments were carried out on NVIDIA A100-SXM4-40GB GPUs, with each run utilizing a single GPU.
Software Dependencies	No	The paper mentions "SGD with momentum" and "learning rate scheduler" as part of the implementation details, but does not specify software dependencies like specific library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	We use cross-entropy loss for all image and language classification tasks and maintain consistent training hyperparameters across all experiments. The optimizer of choice is SGD with momentum, with a default initial learning rate of 0.01. A learning rate scheduler is applied, reducing the learning rate by a factor of 0.1 at rounds 50 and 75, when the total number of communication rounds is set to 100. The total number of communications is set as follows: CIFAR-10, CIFAR-100, and SST: T = 100, MNIST, FMNIST, and SVHN: T = 50. In each round, clients perform one local epoch of training. The batch size is fixed at 128 across all experiments.