Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Robust High-Dimensional Low-Rank Matrix Estimation: Optimal Rate and Data-Adaptive Tuning
Authors: Xiaolong Cui, Lei Shi, Wei Zhong, Changliang Zou
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results indicate that the new estimator can be highly competitive among existing methods, especially for heavy-tailed or skewed errors. Keywords: heavy-tailed error, high dimension, low-rank matrix, non-asymptotic bounds, robustness, tuning parameter selection. 4. Simulation. In this section, we investigate the performance of our proposed rank matrix lasso (RML) estimator through Monte Carlo simulations. The simulation results are evaluated through 100 Monte Carlo replications. 5. Real Data Analysis. This section is devoted to a numerical study based on the well-known Arabidopsis thaliana data. |
| Researcher Affiliation | Academia | Xiaolong Cui EMAIL School of Statistics and Data Science, Nankai University, China. Lei Shi EMAIL Department of Biostatistics, University of California, Berkeley, U.S.A. Wei Zhong EMAIL WISE and Department of Statistics and Data Science, SOE, Xiamen University, China. Changliang Zou EMAIL School of Statistics and Data Sciences, LPMC, KLMDASR and LEBPS, Nankai University, China. |
| Pseudocode | Yes | Appendix G: Algorithms and Complexity Analysis. G1: Proximal Gradient Algorithm for the Rank Matrix Lasso. Algorithm 1: Accelerated proximal gradient algorithm for the rank matrix lasso. Appendix H2: Comparison of First-Order and Second-Order Algorithms. Algorithm 2: Proximal quasi-Newton algorithm for the rank matrix lasso. |
| Open Source Code | No | Our implementation is based on a proximal gradient algorithm which can be found in Appendix G. In our large-scale implementation we use the PROPACK package by Larsen (2004) to achieve this convenience, rendering our algorithm perfectly scalable to large matrix computation problems(m1, m2 as high as 5 104). The paper does not provide an explicit statement of code release for their own implementation or a direct link to a code repository. |
| Open Datasets | Yes | This section is devoted to a numerical study based on the well-known Arabidopsis thaliana data, which monitors the expression levels of a group of genes contributing to the generation of isoprenoids under different experimental conditions. See Wille et al. (2004) and She and Chen (2017) for detailed description. |
| Dataset Splits | Yes | We consider a multivariate regression model for the data, using genes from upstream pathways as predictors and the downstream genes as responses. Here for the sake of comparison, we again consider the four estimators mentioned in Section 4, trained over 80% of the data, Ytrain, and calculate the prediction accuracy based on the remaining data serving as a test set, Ytest. Concretely speaking, the accuracy for the prediction Ypre is measured by two prediction errors, mean absolute deviation (MAD) and mean square error (MSE), as follows, MAD = 1 m1ntest Ypre Ytest 1,1, MSE = 1 m1ntest Ypre Ytest 2 F . Here 1,1 simply gives the summation of the absolute values of all the entries for a given matrix. For matrix lasso, regularized LAD and our RML we apply the pivotal tuning procedure with α = 0.2(for matrix lasso, we simply assume the error follows standard normal distribution), and for Robustified matrix lasso, we determine the tuning parameter using robust cross validation (Fan et al., 2021). We repeat the splitting step 100 times and report the average prediction error and the standard error. |
| Hardware Specification | No | The paper mentions computation time in Table 1 and discusses computational efficiency in Appendix G2 but does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | In our large-scale implementation we use the PROPACK package by Larsen (2004) to achieve this convenience. The paper mentions the PROPACK package but does not provide a specific version number for it or any other software dependencies. |
| Experiment Setup | Yes | The tuning parameter λ given by (8) is obtained by simulation based on 100 repetitions with α0 = 0.2. The tuning parameter of Robustified matrix lasso is given by RCV introduced in Fan et al. (2021). We use ℓ2 , Robust ℓ2 , ℓ1 and RML to denote the four methods, respectively. Appendix G: Algorithms and Complexity Analysis. In our simulations, the tol we picked is 10 4 for all the trials, and the maximal iteration T = 100. We take L(0) = 10 4Lmax, and set Lmax = 3 10p which is empirically found good enough for convergence in most scenarios. |