Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference
Authors: Jiyuan Tu, Weidong Liu, Xiaojun Mao, Xi Chen
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The simulation results are also presented to illustrate the effectiveness of our method. 4. Simulation Studies |
| Researcher Affiliation | Academia | Jiyuan Tu EMAIL School of Mathematical Sciences Shanghai Jiao Tong University, Shanghai, 200240, China Weidong Liu EMAIL School of Mathematical Sciences, School of Life Sciences and Biotechnology Mo E Key Lab of Artificial Intelligence Shanghai Jiao Tong University, Shanghai, 200240, China Xiaojun Mao EMAIL School of Data Science Fudan University, Shanghai, 200433, China Xi Chen EMAIL Stern School of Business New York University, New York, NY 10012, USA |
| Pseudocode | Yes | Algorithm 1 Robust CSL (RCSL) Method |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | No | For the linear model experiment, the data are generated as follows: Yi = XT i θ + ϵi, i = 1, 2, . . . , n, where each Xi = (Xi,1, . . . , Xi,p)T is a p-dimensional covariate vector and (Xi,1, . . . , Xi,p)s are drawn i.i.d. from a multivariate normal distribution N(0, ΣX)... For the logistic regression model experiment, the data are generated from the following: ( 1 with probability L(XT i θ ), 0 with probability 1 L(XT i θ ), i = 1, 2, . . . , n... |
| Dataset Splits | No | The paper uses synthetically generated data and describes its distribution across machines for a distributed inference setup, but does not specify traditional training/validation/test splits as it's a simulation-based study. |
| Hardware Specification | No | The paper describes simulation studies but does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, including libraries or solvers with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | The entire sample size is N = 1000 (100 + 1). By dividing the data into one master machine H0 and 100 worker machines {H1, . . . , H100}. ... each local sample size is n = 1000. We vary the fraction of Byzantine machines αn = 0.05, 0.1, 0.15. ... we vary the number of quantile levels K from {10, 20, 50, 100}... we fix the number of quantiles K in (7) to be K = 10. ... dimension p = 30 and generate the entries of the true coefficient vector θ to be p 1/2(1, (p 2)/(p 1), (p 3)/(p 1), . . . , 0). ... noise ϵi N(0, 1). ... µx = 0 and µx = 0.5. ... Gaussian attack: N(0, 200I)... Omniscient attack: scale constant is extremely large (1e10)... Bit-flip attack: flipping the sign fo the first five dimensions. ... We use the tolerance parameter er = 10 4 as the stopping criterion. In our experiments, it only requires 4 to 8 iterations to stop. We also provide the results with simple fixed number of iterations with T = 5 and T = 10. |