Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Estimation and Completion of Matrices with Biclustering Structures

Authors: Chao Gao, Yu Lu, Zongming Ma, Harrison H. Zhou

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Implementation and simulation results are given in Section 5. Now we present some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data.
Researcher Affiliation	Academia	Chao Gao EMAIL Yu Lu EMAIL Yale University Zongming Ma EMAIL University of Pennsylvania Harrison H. Zhou EMAIL Yale University
Pseudocode	Yes	Algorithm 1: A Biclustering Algorithm
Open Source Code	No	The paper does not contain any explicit statements about providing open-source code or links to a code repository for the described methodology.
Open Datasets	No	Now we present some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data. We ﬁrst generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate p = 0.5.
Dataset Splits	No	The paper describes generating simulated data for numerical studies but does not specify any training/test/validation dataset splits. The evaluation involves comparing the estimator's error rate behavior on these generated datasets.
Hardware Specification	No	The paper provides numerical results on simulated data but does not mention any specific hardware used for running these simulations or experiments.
Software Dependencies	No	The paper mentions algorithms like 'k-means algorithm' and 'singular value decomposition' but does not specify any software names with version numbers used for implementation or analysis.
Experiment Setup	Yes	Our theoretical result indicates the rate of recovery is rρpk2n2 + log k for the root mean squared error (RMSE) 1n ˆθ θ. When k is not too large, the dominating pn . We are going to conﬁrm this rate by simulation. We ﬁrst generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate p = 0.5. For every ﬁxed k, we use four diﬀerent Q = 0.51k1T k +0.1t Ik with t = 1, 2, 3, 4 and generate the community labels z uniformly on [k]. Then we calculate the error 1n ˆθ θ. Panel (a) of Figure 1 shows the error versus the sample size n. ... We simulate data with Gaussian noise under four diﬀerent settings of k1 and k2. For each (k1, k2) {(4, 4), (4, 8), (8, 8), (8, 12)}, the entries of matrix Q are independently and uniformly generated from {1, 2, 3, 4, 5}. The cluster labels z1 and z2 are uniform on [k1] and [k2] respectively. After generating Q, z1 and z2, we add an N(0, 1) noise to the data and observe Xij with probability p = 0.1.