Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FedRACE: A Hierarchical and Statistical Framework for Robust Federated Learning

Authors: Gang Yan, Sikai Yang, Wan Du

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement FEDRACE on the Fed Scale platform and evaluate it on CIFAR-100, Food-101, and Tiny Image Net under diverse attack scenarios. FEDRACE achieves a true positive rate of up to 99.3% with a false positive rate below 1.2%, while preserving model accuracy and improving generalization.
Researcher Affiliation	Academia	Gang Yan Sikai Yang Wan Du University of California, Merced Merced, CA, United States EMAIL
Pseudocode	Yes	A.1 Algorithm Overview of the Multi-step Voting Algorithm 1 Multi-step Voting
Open Source Code	No	Upon acceptance, we will release the source code, training scripts, and detailed instructions as part of the supplementary material.
Open Datasets	Yes	We evaluate our framework on three widely used image classification benchmarks: CIFAR-100 [28], Food-101 [27], and Tiny Image Net [25].
Dataset Splits	Yes	CIFAR-100 contains 100 classes with 600 images per class, divided into 500 training and 100 testing samples. Food-101 includes 101 food categories with 750 training and 250 testing images per class... Tiny Image Net ... consisting of 200 classes with 500 training and 50 validation images per class... Client datasets are partitioned in a non-IID manner using a Dirichlet distribution with concentration parameter α = 0.5.
Hardware Specification	Yes	All experiments are conducted on NVIDIA RTX A4500 GPUs and repeated with four random seeds (1, 12, 123, 1234).
Software Dependencies	No	The system is implemented using the Fed Scale platform [26] with Py Torch [61], and leverages GPU acceleration for efficient training and inference.
Experiment Setup	Yes	Local training uses a learning rate of 0.001, batch size 128, and three epochs per round across all datasets. All experiments are conducted on NVIDIA RTX A4500 GPUs and repeated with four random seeds (1, 12, 123, 1234).