Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A sharp NMF result with applications in network modeling

Authors: Jiashun Jin

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now consider some real examples. The weblog is a well-known data set [22], where with some light preprocessing, the network has 1, 222 node (each is a blog) and 16, 714 edges (each is a two-way hyperlink). The network has two communities: democratic and republican. For this data set, a rank-2 model is appropriate, so we have (n, K) = (1, 222, 2) (e.g., [30, 12, 18]).
Researcher Affiliation	Academia	Jiashun Jin Department of Statistics & Data Science Carnegie Mellon University Pittsburgh, PA 15213 EMAIL
Pseudocode	No	The paper describes an approach in Section 4 with numbered steps but does not present it as formal pseudocode or an algorithm block.
Open Source Code	No	The self-evaluation section states: "Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The data sets we use are well-known public data sets. The algorithm details are given in Section 4."
Open Datasets	Yes	The weblog is a well-known data set [22], where with some light preprocessing, the network has 1, 222 node (each is a blog) and 16, 714 edges (each is a two-way hyperlink). The network has two communities: democratic and republican. For this data set, a rank-2 model is appropriate, so we have (n, K) = (1, 222, 2) (e.g., [30, 12, 18]).
Dataset Splits	No	The paper discusses using datasets for analysis but does not specify train/validation/test splits, percentages, or absolute sample counts for each split.
Hardware Specification	No	The self-evaluation section states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]" The paper does not mention specific hardware used for experiments.
Software Dependencies	No	The self-evaluation section states: "Did you specify all the training details (e.g., data splits, hyper-parameters, how they were chosen)? [Yes]" However, the main text does not list specific software dependencies with version numbers.
Experiment Setup	Yes	The self-evaluation section states: "Did you specify all the training details (e.g., data splits, hyper-parameters, how they were chosen)? [Yes]" Section 4 outlines the "approach" for estimating parameters and checking conditions, which serves as the experimental setup.