Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Matched Bipartite Block Model with Covariates

Authors: Zahra S. Razaee, Arash A. Amini, Jingyi Jessica Li

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We derive a simple fast algorithm for ﬁtting the model based on variational inference ideas and show its eﬀectiveness on both simulated and real data.
Researcher Affiliation	Academia	Zahra S. Razaee EMAIL Arash A. Amini EMAIL Jingyi Jessica Li EMAIL University of California, Los Angeles Department of Statistics 8125 Math Sciences Bldg., Box 951554 Los Angeles, CA 90095-1554, USA
Pseudocode	Yes	Algorithm 1 Variational block coordinate ascent for ﬁtting mbi SBM Algorithm 2 Bipartite Spectral Clustering (bi SC)
Open Source Code	Yes	The code is available on Github (Razaee et al.).
Open Datasets	Yes	We have applied the algorithm to two wikipedia page user networks, which we will call Top Articles and Cities... Wikipedia usage statistics were scraped from Wikimedia Statistics using code inspired by Keegan (2014)... The ﬁrst set was extracted from the Arnetminer collection, based on papers published from 1990 to 2005 in certain CS venues by Tang et al. (2012). The second set of data was scraped from DBLP... Wikimedia Statistics. URL https://stats.wikimedia.org. DBLP. Computer Science Bibliography. URL https://dblp.uni-trier.de.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits for model evaluation, rather it describes methods like subsampling for robustness analysis. For the simulated data, the 'true labels' are known. For real data, 'true communities' are used for NMI evaluation without specifying train/test splits.
Hardware Specification	No	The paper mentions: 'This work used computational and storage services associated with the Hoﬀman2 Shared Cluster provided by UCLA Institute for Digital Research and Education s Research Technology Group.' However, it does not provide specific details such as CPU/GPU models, memory, or other hardware configurations.
Software Dependencies	No	The paper mentions the 'ggmap R package by Kahle and Wickham (2013)' and 'ipapi' for geo-data, and that 'Wikipedia usage statistics were scraped from Wikimedia Statistics using code inspired by Keegan (2014)'. However, it does not provide specific version numbers for these or any other key software components used in their methodology.
Experiment Setup	Yes	Key parameters regarding covariate generation in (2) are (µ, Σ) for generating v k. We take µ = 0 and Σ = νId1+d2 throughout. Varying ν (or dimensions dr) changes the information provided by the covariates (Appendix E). Larger ν causes v k to be further apart, hence covariates are more informative. ν = 0 corresponds to zero covariate information. We also ﬁx covariate noise levels at σr = 0.5 for r = 1, 2, and the network size at N = (N1, N2) = (200, 800). Pick tolerance ε (0, 1].