Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices

Authors: Dan Yang, Zongming Ma, Andreas Buja

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method. Keywords: Denoising, High dimensionality, Low rank matrices, Minimax rates, Simultaneously structured matrices, Sparse SVD, Sparsity. ... The theoretical results are further validated and supported by numerical experiments on synthetic data.
Researcher Affiliation	Academia	Dan Yang EMAIL Department of Statistics and Biostatistics Rutgers University Piscataway, NJ 08854, USA Zongming Ma EMAIL Andreas Buja EMAIL Department of Statistics University of Pennsylvania Philadelphia, PA 19104, USA
Pseudocode	Yes	Algorithm 1: Matrix Denoising via Two-Way Iterative Thresholding ... Algorithm 2: Initialization for Algorithm 1
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository or mention code in supplementary materials.
Open Datasets	No	Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method. ... In this section, we demonstrate the performance of the proposed denoising method on synthetic datasets. In the ﬁrst numerical experiment, we study the eﬀect of the signal-to-noise ratio. To this end, we ﬁx m = 2000, n = 1000, k = l = 50, r = 10, and set the singular values of M as (d1, . . . , d10) = (200, 190, . . . , 120, 110).
Dataset Splits	No	The paper uses synthetic datasets generated for each experiment and discusses varying parameters for these generations (e.g., m, n, k, l, r, singular values, noise standard deviation). It mentions '100 repetitions' for statistical analysis but does not describe any train/test/validation splits for a fixed dataset, as it's generating new data for each simulation run.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) used for the implementation or experiments.
Experiment Setup	Yes	In the ﬁrst numerical experiment, we study the eﬀect of the signal-to-noise ratio. To this end, we ﬁx m = 2000, n = 1000, k = l = 50, r = 10, and set the singular values of M as (d1, . . . , d10) = (200, 190, . . . , 120, 110). The signal-to-noise ratio is varied by varying noise standard deviation σ on ten equally spaced values between 0.2 and 2. ... Throughout this section, we use (6) to estimate σ, Algorithm 2 with α = 4 to compute V(0) and (13) to select the rank. In Algorithm 1, we set β = 3 and we terminate the iteration once (5) holds with ϵ = 10-10. The thresholding function η is ﬁxed to be hard thresholding η(x, t) = x1\|x\|>t.