Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference

Authors: Jiyuan Tu, Weidong Liu, Xiaojun Mao, Xi Chen

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The simulation results are also presented to illustrate the eﬀectiveness of our method. 4. Simulation Studies
Researcher Affiliation	Academia	Jiyuan Tu EMAIL School of Mathematical Sciences Shanghai Jiao Tong University, Shanghai, 200240, China Weidong Liu EMAIL School of Mathematical Sciences, School of Life Sciences and Biotechnology Mo E Key Lab of Artiﬁcial Intelligence Shanghai Jiao Tong University, Shanghai, 200240, China Xiaojun Mao EMAIL School of Data Science Fudan University, Shanghai, 200433, China Xi Chen EMAIL Stern School of Business New York University, New York, NY 10012, USA
Pseudocode	Yes	Algorithm 1 Robust CSL (RCSL) Method
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets	No	For the linear model experiment, the data are generated as follows: Yi = XT i θ + ϵi, i = 1, 2, . . . , n, where each Xi = (Xi,1, . . . , Xi,p)T is a p-dimensional covariate vector and (Xi,1, . . . , Xi,p)s are drawn i.i.d. from a multivariate normal distribution N(0, ΣX)... For the logistic regression model experiment, the data are generated from the following: ( 1 with probability L(XT i θ ), 0 with probability 1 L(XT i θ ), i = 1, 2, . . . , n...
Dataset Splits	No	The paper uses synthetically generated data and describes its distribution across machines for a distributed inference setup, but does not specify traditional training/validation/test splits as it's a simulation-based study.
Hardware Specification	No	The paper describes simulation studies but does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, including libraries or solvers with version numbers, needed to replicate the experiments.
Experiment Setup	Yes	The entire sample size is N = 1000 (100 + 1). By dividing the data into one master machine H0 and 100 worker machines {H1, . . . , H100}. ... each local sample size is n = 1000. We vary the fraction of Byzantine machines αn = 0.05, 0.1, 0.15. ... we vary the number of quantile levels K from {10, 20, 50, 100}... we ﬁx the number of quantiles K in (7) to be K = 10. ... dimension p = 30 and generate the entries of the true coeﬃcient vector θ to be p 1/2(1, (p 2)/(p 1), (p 3)/(p 1), . . . , 0). ... noise ϵi N(0, 1). ... µx = 0 and µx = 0.5. ... Gaussian attack: N(0, 200I)... Omniscient attack: scale constant is extremely large (1e10)... Bit-ﬂip attack: ﬂipping the sign fo the ﬁrst ﬁve dimensions. ... We use the tolerance parameter er = 10 4 as the stopping criterion. In our experiments, it only requires 4 to 8 iterations to stop. We also provide the results with simple ﬁxed number of iterations with T = 5 and T = 10.