Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Estimation and Completion of Matrices with Biclustering Structures
Authors: Chao Gao, Yu Lu, Zongming Ma, Harrison H. Zhou
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Implementation and simulation results are given in Section 5. Now we present some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data. |
| Researcher Affiliation | Academia | Chao Gao EMAIL Yu Lu EMAIL Yale University Zongming Ma EMAIL University of Pennsylvania Harrison H. Zhou EMAIL Yale University |
| Pseudocode | Yes | Algorithm 1: A Biclustering Algorithm |
| Open Source Code | No | The paper does not contain any explicit statements about providing open-source code or links to a code repository for the described methodology. |
| Open Datasets | No | Now we present some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data. We first generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate p = 0.5. |
| Dataset Splits | No | The paper describes generating simulated data for numerical studies but does not specify any training/test/validation dataset splits. The evaluation involves comparing the estimator's error rate behavior on these generated datasets. |
| Hardware Specification | No | The paper provides numerical results on simulated data but does not mention any specific hardware used for running these simulations or experiments. |
| Software Dependencies | No | The paper mentions algorithms like 'k-means algorithm' and 'singular value decomposition' but does not specify any software names with version numbers used for implementation or analysis. |
| Experiment Setup | Yes | Our theoretical result indicates the rate of recovery is rρpk2n2 + log k for the root mean squared error (RMSE) 1n ˆθ θ. When k is not too large, the dominating pn . We are going to confirm this rate by simulation. We first generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate p = 0.5. For every fixed k, we use four different Q = 0.51k1T k +0.1t Ik with t = 1, 2, 3, 4 and generate the community labels z uniformly on [k]. Then we calculate the error 1n ˆθ θ. Panel (a) of Figure 1 shows the error versus the sample size n. ... We simulate data with Gaussian noise under four different settings of k1 and k2. For each (k1, k2) {(4, 4), (4, 8), (8, 8), (8, 12)}, the entries of matrix Q are independently and uniformly generated from {1, 2, 3, 4, 5}. The cluster labels z1 and z2 are uniform on [k1] and [k2] respectively. After generating Q, z1 and z2, we add an N(0, 1) noise to the data and observe Xij with probability p = 0.1. |