Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Federated Multi-Task Learning under a Mixture of Distributions
Authors: Othmane Marfoq, Giovanni Neglia, Aurélien Bellet, Laetitia Kameni, Richard Vidal
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on FL benchmarks show that our approach provides models with higher accuracy and fairness than state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1Inria, Université Côte d Azur, France, EMAIL 2Inria, Université de Lille, France, EMAIL 3Accenture Labs, France, EMAIL |
| Pseudocode | Yes | Algorithm 1: Fed EM (see also the more detailed Alg. 2 in App. D.1) |
| Open Source Code | Yes | Code is available at https://github.com/omarfoq/Fed EM. |
| Open Datasets | Yes | We evaluated our method on five federated benchmark datasets spanning a wide range of machine learning tasks: image classification (CIFAR10 and CIFAR100 [33]), handwritten character recognition (EMNIST [8] and FEMNIST [7]),5 and language modeling (Shakespeare [7, 47]). |
| Dataset Splits | Yes | For all tasks, we randomly split each local dataset into training (60%), validation (20%) and test (20%) sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Mobile Net-v2 [55]' and 'Stacked-LSTM [25]' as models but does not specify software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | For each method and each task, the learning rate and the other hyperparameters were tuned via grid search (details in App. I.2). Fed Avg+ updated the local model through a single pass on the local dataset. Unless otherwise stated, the number of components considered by Fed EM was M = 3, training occurred over 80 communication rounds for Shakespeare and 200 rounds for all other datasets. At each round, clients train for one epoch. |