Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A sharp NMF result with applications in network modeling
Authors: Jiashun Jin
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now consider some real examples. The weblog is a well-known data set [22], where with some light preprocessing, the network has 1, 222 node (each is a blog) and 16, 714 edges (each is a two-way hyperlink). The network has two communities: democratic and republican. For this data set, a rank-2 model is appropriate, so we have (n, K) = (1, 222, 2) (e.g., [30, 12, 18]). |
| Researcher Affiliation | Academia | Jiashun Jin Department of Statistics & Data Science Carnegie Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | No | The paper describes an approach in Section 4 with numbered steps but does not present it as formal pseudocode or an algorithm block. |
| Open Source Code | No | The self-evaluation section states: "Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The data sets we use are well-known public data sets. The algorithm details are given in Section 4." |
| Open Datasets | Yes | The weblog is a well-known data set [22], where with some light preprocessing, the network has 1, 222 node (each is a blog) and 16, 714 edges (each is a two-way hyperlink). The network has two communities: democratic and republican. For this data set, a rank-2 model is appropriate, so we have (n, K) = (1, 222, 2) (e.g., [30, 12, 18]). |
| Dataset Splits | No | The paper discusses using datasets for analysis but does not specify train/validation/test splits, percentages, or absolute sample counts for each split. |
| Hardware Specification | No | The self-evaluation section states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]" The paper does not mention specific hardware used for experiments. |
| Software Dependencies | No | The self-evaluation section states: "Did you specify all the training details (e.g., data splits, hyper-parameters, how they were chosen)? [Yes]" However, the main text does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | The self-evaluation section states: "Did you specify all the training details (e.g., data splits, hyper-parameters, how they were chosen)? [Yes]" Section 4 outlines the "approach" for estimating parameters and checking conditions, which serves as the experimental setup. |