Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust High-Dimensional Low-Rank Matrix Estimation: Optimal Rate and Data-Adaptive Tuning

Authors: Xiaolong Cui, Lei Shi, Wei Zhong, Changliang Zou

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical results indicate that the new estimator can be highly competitive among existing methods, especially for heavy-tailed or skewed errors. Keywords: heavy-tailed error, high dimension, low-rank matrix, non-asymptotic bounds, robustness, tuning parameter selection. 4. Simulation. In this section, we investigate the performance of our proposed rank matrix lasso (RML) estimator through Monte Carlo simulations. The simulation results are evaluated through 100 Monte Carlo replications. 5. Real Data Analysis. This section is devoted to a numerical study based on the well-known Arabidopsis thaliana data.
Researcher Affiliation	Academia	Xiaolong Cui EMAIL School of Statistics and Data Science, Nankai University, China. Lei Shi EMAIL Department of Biostatistics, University of California, Berkeley, U.S.A. Wei Zhong EMAIL WISE and Department of Statistics and Data Science, SOE, Xiamen University, China. Changliang Zou EMAIL School of Statistics and Data Sciences, LPMC, KLMDASR and LEBPS, Nankai University, China.
Pseudocode	Yes	Appendix G: Algorithms and Complexity Analysis. G1: Proximal Gradient Algorithm for the Rank Matrix Lasso. Algorithm 1: Accelerated proximal gradient algorithm for the rank matrix lasso. Appendix H2: Comparison of First-Order and Second-Order Algorithms. Algorithm 2: Proximal quasi-Newton algorithm for the rank matrix lasso.
Open Source Code	No	Our implementation is based on a proximal gradient algorithm which can be found in Appendix G. In our large-scale implementation we use the PROPACK package by Larsen (2004) to achieve this convenience, rendering our algorithm perfectly scalable to large matrix computation problems(m1, m2 as high as 5 104). The paper does not provide an explicit statement of code release for their own implementation or a direct link to a code repository.
Open Datasets	Yes	This section is devoted to a numerical study based on the well-known Arabidopsis thaliana data, which monitors the expression levels of a group of genes contributing to the generation of isoprenoids under diﬀerent experimental conditions. See Wille et al. (2004) and She and Chen (2017) for detailed description.
Dataset Splits	Yes	We consider a multivariate regression model for the data, using genes from upstream pathways as predictors and the downstream genes as responses. Here for the sake of comparison, we again consider the four estimators mentioned in Section 4, trained over 80% of the data, Ytrain, and calculate the prediction accuracy based on the remaining data serving as a test set, Ytest. Concretely speaking, the accuracy for the prediction Ypre is measured by two prediction errors, mean absolute deviation (MAD) and mean square error (MSE), as follows, MAD = 1 m1ntest Ypre Ytest 1,1, MSE = 1 m1ntest Ypre Ytest 2 F . Here 1,1 simply gives the summation of the absolute values of all the entries for a given matrix. For matrix lasso, regularized LAD and our RML we apply the pivotal tuning procedure with α = 0.2(for matrix lasso, we simply assume the error follows standard normal distribution), and for Robustiﬁed matrix lasso, we determine the tuning parameter using robust cross validation (Fan et al., 2021). We repeat the splitting step 100 times and report the average prediction error and the standard error.
Hardware Specification	No	The paper mentions computation time in Table 1 and discusses computational efficiency in Appendix G2 but does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	In our large-scale implementation we use the PROPACK package by Larsen (2004) to achieve this convenience. The paper mentions the PROPACK package but does not provide a specific version number for it or any other software dependencies.
Experiment Setup	Yes	The tuning parameter λ given by (8) is obtained by simulation based on 100 repetitions with α0 = 0.2. The tuning parameter of Robustiﬁed matrix lasso is given by RCV introduced in Fan et al. (2021). We use ℓ2 , Robust ℓ2 , ℓ1 and RML to denote the four methods, respectively. Appendix G: Algorithms and Complexity Analysis. In our simulations, the tol we picked is 10 4 for all the trials, and the maximal iteration T = 100. We take L(0) = 10 4Lmax, and set Lmax = 3 10p which is empirically found good enough for convergence in most scenarios.