Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Tension between Byzantine Robustness and No-Attack Accuracy in Distributed Learning

Authors: Yi-Rui Yang, Chang-Wei Shi, Wu-Jun Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we will empirically test the effect of using robust aggregator when there are no Byzantine workers. Specifically, we use Byz SGD with various robust aggregators to train a Res Net-20 (He et al., 2016) deep learning model on the CIFAR-10 dataset (Krizhevsky et al., 2009) for 160 epochs without attacks. All the experiments are conducted on a distributed platform with 16 Docker containers serving as workers and an extra Docker container as the server. Each Docker container is bound to an NVIDIA TITAN Xp GPU. We test the performance of each method when the training instances are randomly distributed to the workers according to the Dirichlet distribution with hyperparameters α = 0.1, 1.0 and 10.0, respectively. A smaller α will lead to a more heterogeneous data distribution. Moreover, the batch normalization (BN) layers in the Res Net-20 model are replaced with group normalization layers since BN layers have a poor performance with heterogeneous data across workers (Wu & He, 2018). All algorithms are implemented with Py Torch 1.3.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, School of Computer Science, Nanjing University, Nanjing, China. Correspondence to: Wu-Jun Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 Byzantine-Robust Gradient Descent (Byz GD) Input: iteration number T, learning rates {ηt}T 1 t=0 , robust aggregator Agg( ); Initialization: model parameter w0; for t = 0 to T 1 do Broadcast wt to all workers; on worker i {1, . . . , n} in parallel do Compute local gradient gi = Fi(wt); Send gi to the server; end on worker Compute: wt+1 = wt ηt Agg(g1, . . . , gn); end for Output: model parameter w T .
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	Specifically, we use Byz SGD with various robust aggregators to train a Res Net-20 (He et al., 2016) deep learning model on the CIFAR-10 dataset (Krizhevsky et al., 2009) for 160 epochs without attacks.
Dataset Splits	No	The paper mentions using the CIFAR-10 dataset but does not explicitly provide the training, testing, or validation split percentages or sample counts. It only describes how training instances are distributed among workers.
Hardware Specification	Yes	All the experiments are conducted on a distributed platform with 16 Docker containers serving as workers and an extra Docker container as the server. Each Docker container is bound to an NVIDIA TITAN Xp GPU.
Software Dependencies	Yes	All algorithms are implemented with Py Torch 1.3.
Experiment Setup	Yes	We use cross-entropy as the loss function, set the batch size on each worker to 16, and use the cosine annealing learning rates (Loshchilov & Hutter, 2017). Specifically, the learning rate at the p-th epoch is ηp = 1+cos(pπ/160) / 2 η0 for p = 0, 1, . . . , 159. The initial learning rate η0 is selected from {0.1, 0.2, 0.5, 1.0}, and the best final top-1 test accuracy is used as the final metrics. Local momentum is used with momentum hyper-parameter set to 0.9.