Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sketched Gaussian Mechanism for Private Federated Learning

Authors: Qiaobo Li, Zhijie Chen, Arindam Banerjee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results confirm that at the same privacy level, SGM based FL is at least competitive with non-sketching private FL variants and outperforms them in some settings.
Researcher Affiliation	Academia	Qiaobo Li Siebel School of Computing and Data Science University of Illinois Urbana-Champaign EMAIL; Zhijie Chen Siebel School of Computing and Data Science University of Illinois Urbana-Champaign EMAIL; Arindam Banerjee Siebel School of Computing and Data Science University of Illinois Urbana-Champaign EMAIL
Pseudocode	Yes	Algorithm 1 Sketched Gaussian Mechanism; Algorithm 2 Fed-SGM
Open Source Code	Yes	All experiments in this section and Section 4 are conducted on a computing cluster with an AMD EPYC 7713 64-core processor and an NVIDIA A100 Tensor Core GPU, and the code is provided at https://github.com/lucmonl/mlopt/tree/main.
Open Datasets	Yes	For the vision task, We use the full EMNIST By Class dataset, which comprises 814K training samples and 140K testing samples across 62 classes... For the language task, we use the SST-2 dataset from the GLUE benchmark [69], which comprises 67349 training samples and 1821 test samples across two sentiment classes.
Dataset Splits	Yes	For the vision task, We use the full EMNIST By Class dataset, which comprises 814K training samples and 140K testing samples across 62 classes... For the language task, we use the SST-2 dataset from the GLUE benchmark [69], which comprises 67349 training samples and 1821 test samples across two sentiment classes.
Hardware Specification	Yes	All experiments in this section and Section 4 are conducted on a computing cluster with an AMD EPYC 7713 64-core processor and an NVIDIA A100 Tensor Core GPU, and the code is provided at https://github.com/lucmonl/mlopt/tree/main.
Software Dependencies	No	The paper mentions training deep learning models (ResNet101, BERT-Base) and discusses optimizers (GD, Adam, AMSGrad), but does not specify software versions for programming languages or libraries (e.g., Python, PyTorch, TensorFlow, CUDA versions) used in the implementation.
Experiment Setup	Yes	We deploy C = 625 clients in total, sampling N = 4 clients uniformly at random in each communication round. Each selected client executes K = 18 local SGD updates on mini-batches of size 64, with gradient clipping threshold τ = 1. Sketching dimension and total rounds are chosen per task: for the vision task, we set b = 4 105 (approximately 1% compression rate) and run T = 500 communication rounds; for the language task, we use b = 2 105 (approximately 0.2% compression rate) and T = 200 rounds. For privacy, we fix the parameter δp = 10 5. For both tasks, we consider noise scales σg {0.8, 1, 2, 4} for the unsketched algorithms.