Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spectrum Estimation from a Few Entries

Authors: Ashish Khetan, Sewoong Oh

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments suggest that we signiﬁcantly improve upon a competing approach of using matrix completion methods, below the matrix completion threshold, above which matrix completion algorithms recover the underlying low-rank matrix exactly.
Researcher Affiliation	Academia	Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign Urbana, IL 61801, USA
Pseudocode	Yes	Algorithm 1 Schatten k-norm estimator
Open Source Code	Yes	A MATLAB implementation of the estimator (3), that includes as its sub-routines the computation of the weights of all k-cyclic pseudographs, is available for download at https://github.com/khetan2/Schatten_norm_estimation.
Open Datasets	No	The paper describes generating synthetic data for its numerical experiments, not the use of pre-existing public datasets. For example, in the description of Figure 4, it states: "M is a symmetric positive semi-deﬁnite matrix of size d = 500, and rank r = 100 (left panel) and r = 500 (right panel). Singular vectors U of M = UΣU , are generated by QR decomposition of N(0, Id d) and Σi,i is uniformly distributed over [1, 2]."
Dataset Splits	No	The paper describes theoretical analysis and numerical experiments primarily involving synthetically generated matrices and a sampling process (Erdos-Renyi sampling of matrix entries). It does not specify traditional training/test/validation dataset splits typically found in empirical machine learning studies for model training and evaluation.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments. It focuses on theoretical computational complexity (e.g., O(dα)).
Software Dependencies	No	The paper mentions "A MATLAB implementation of the estimator (3)" on page 8, but it does not specify a version number for MATLAB or any other critical software libraries or dependencies required to reproduce the experiments.
Experiment Setup	Yes	M is a symmetric positive semi-deﬁnite matrix of size d = 500, and rank r = 100 (left panel) and r = 500 (right panel). Singular vectors U of M = UΣU , are generated by QR decomposition of N(0, Id d) and Σi,i is uniformly distributed over [1, 2]. For a low rank matrix on the left, there is a clear critical value of p = 0.45, above which matrix completion is exact with high probability. We construct a symmetric matrix M of size d = 1000 and rank r = 200, σi Uni(0, 0.4) for 1 i r/2, and σi Uni(0.6, 1) for r/2 + 1 i r. We estimate br(PΩ(M); c1, c2) for Erd os-R enyi sampling Ω, and a choice of c2 = 0.5 and c1 = 0.6, which is motivated by the distribution of σi. We use Chebyshev polynomial of degree Cb = 2, and s = 1 for qs.