Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Trustworthy Federated Learning with Untrusted Participants
Authors: Youssef Allouah, Rachid Guerraoui, John Stephan
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on standard benchmarks validate CAFCOR s practicality, showing that privacy and robustness can coexist in distributed systems without sacrificing utility or trusting the server. |
| Researcher Affiliation | Academia | 1EPFL, Switzerland. Alphabetical order. Correspondence to: Youssef Allouah <EMAIL>, John Stephan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 CAFCOR Input: Initial model θ0; DP noise levels σind, σcor; batch size b; clipping threshold C; learning rates {γt}; momentum coefficients {βt}; number of iterations T. Algorithm 2 CAF: Covariance bound-Agnostic Filter Input: vectors x1, . . . , xn Rd; bound on number of corrupt inputs 0 f < n |
| Open Source Code | No | To facilitate reproducibility, we intend to publicly release our code. |
| Open Datasets | Yes | We consider two widely used image classification datasets: MNIST (Le Cun & Cortes, 2010) and Fashion-MNIST (Xiao et al., 2017). |
| Dataset Splits | Yes | On MNIST, we use batch size b = 50, learning rate γ = 0.075, momentum parameter β = 0.85, and clipping parameter C = 2.25. For Fashion-MNIST, we use b = 100, γ = 0.3, β = 0.9, and C = 1. For both datasets, we train for T = 30 iterations and apply ℓ2-regularization at 10 4. We adopt user-level DP across all threat models. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory details). It only mentions general concepts like 'distributed environment'. |
| Software Dependencies | No | To estimate the privacy budgets achieved at the end of training, we use Opacus (Yousefpour et al., 2021). The citation mentions 'User-friendly differential privacy library in pytorch, 2021', implying PyTorch but no specific version number for PyTorch or Opacus is provided. |
| Experiment Setup | Yes | On MNIST, we use batch size b = 50, learning rate γ = 0.075, momentum parameter β = 0.85, and clipping parameter C = 2.25. For Fashion-MNIST, we use b = 100, γ = 0.3, β = 0.9, and C = 1. For both datasets, we train for T = 30 iterations and apply ℓ2-regularization at 10 4. We adopt user-level DP across all threat models. On MNIST, the privacy budgets reach ε = 26.4 and 27.8 for f = 10 and f = 5, respectively. On Fashion-MNIST, the privacy budget is ε = 39.6 for both values of f. Throughout, we set δ = 10 4. |