Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Aequa: Fair Model Rewards in Collaborative Learning via Slimmable Networks

Authors: Nurbek Tastan, Samuel Horváth, Karthik Nandakumar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically study the convergence of our proposed approach and empirically validate it using extensive experiments on different datasets and architectures. We also extend our approach to enable training-time model reward allocation. The code can be found at https://github.com/tnurbek/aequa.
Researcher Affiliation Academia 1Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE 2Michigan State University (MSU), Michigan, USA. Correspondence to: Nurbek Tastan <EMAIL>.
Pseudocode Yes Algorithm 1 Aequa: Federated optimization Algorithm 2 Aequa (with training-time model rewards)
Open Source Code Yes The code can be found at https://github.com/tnurbek/aequa.
Open Datasets Yes We use the following datasets to carry out our experiments (following (Li et al., 2020)): MNIST (Le Cun, 1998), Fashion-MNIST (FMNIST) (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 & CIFAR-100 (Krizhevsky et al., 2009), Stanford Sentiment Treebank (SST) (Socher et al., 2013), and the federated handwriting dataset FEMNIST (Caldas et al., 2019).
Dataset Splits Yes The other datasets are partitioned using the following strategies: (i) homogeneous, where each participant gets an equal number of data points per class; (ii) heterogeneous, where each client gets a varying number of data points per class based on a Dirichlet(α) distribution (concentration parameter α reflects the degree of non-i.i.d. characteristics within the dataset); (iii) quantity skew allocates κ proportion of total data points to each of the m selected participants and the remaining N m participants split the remaining data equally; (iv) label skew, denoted by #C = m, creates a label imbalance by sampling m classes for each client and then randomly distributing samples from class m among selected participants.
Hardware Specification Yes All experiments were carried out on NVIDIA A100-SXM4-40GB GPUs, with each run utilizing a single GPU.
Software Dependencies No The paper mentions "SGD with momentum" and "learning rate scheduler" as part of the implementation details, but does not specify software dependencies like specific library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes We use cross-entropy loss for all image and language classification tasks and maintain consistent training hyperparameters across all experiments. The optimizer of choice is SGD with momentum, with a default initial learning rate of 0.01. A learning rate scheduler is applied, reducing the learning rate by a factor of 0.1 at rounds 50 and 75, when the total number of communication rounds is set to 100. The total number of communications is set as follows: CIFAR-10, CIFAR-100, and SST: T = 100, MNIST, FMNIST, and SVHN: T = 50. In each round, clients perform one local epoch of training. The batch size is fixed at 128 across all experiments.