Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Trustworthy Federated Learning with Untrusted Participants

Authors: Youssef Allouah, Rachid Guerraoui, John Stephan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on standard benchmarks validate CAFCOR s practicality, showing that privacy and robustness can coexist in distributed systems without sacrificing utility or trusting the server.
Researcher Affiliation	Academia	1EPFL, Switzerland. Alphabetical order. Correspondence to: Youssef Allouah <EMAIL>, John Stephan <EMAIL>.
Pseudocode	Yes	Algorithm 1 CAFCOR Input: Initial model θ0; DP noise levels σind, σcor; batch size b; clipping threshold C; learning rates {γt}; momentum coefficients {βt}; number of iterations T. Algorithm 2 CAF: Covariance bound-Agnostic Filter Input: vectors x1, . . . , xn Rd; bound on number of corrupt inputs 0 f < n
Open Source Code	No	To facilitate reproducibility, we intend to publicly release our code.
Open Datasets	Yes	We consider two widely used image classification datasets: MNIST (Le Cun & Cortes, 2010) and Fashion-MNIST (Xiao et al., 2017).
Dataset Splits	Yes	On MNIST, we use batch size b = 50, learning rate γ = 0.075, momentum parameter β = 0.85, and clipping parameter C = 2.25. For Fashion-MNIST, we use b = 100, γ = 0.3, β = 0.9, and C = 1. For both datasets, we train for T = 30 iterations and apply ℓ2-regularization at 10 4. We adopt user-level DP across all threat models.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory details). It only mentions general concepts like 'distributed environment'.
Software Dependencies	No	To estimate the privacy budgets achieved at the end of training, we use Opacus (Yousefpour et al., 2021). The citation mentions 'User-friendly differential privacy library in pytorch, 2021', implying PyTorch but no specific version number for PyTorch or Opacus is provided.
Experiment Setup	Yes	On MNIST, we use batch size b = 50, learning rate γ = 0.075, momentum parameter β = 0.85, and clipping parameter C = 2.25. For Fashion-MNIST, we use b = 100, γ = 0.3, β = 0.9, and C = 1. For both datasets, we train for T = 30 iterations and apply ℓ2-regularization at 10 4. We adopt user-level DP across all threat models. On MNIST, the privacy budgets reach ε = 26.4 and 27.8 for f = 10 and f = 5, respectively. On Fashion-MNIST, the privacy budget is ε = 39.6 for both values of f. Throughout, we set δ = 10 4.