Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Matched Bipartite Block Model with Covariates
Authors: Zahra S. Razaee, Arash A. Amini, Jingyi Jessica Li
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We derive a simple fast algorithm for fitting the model based on variational inference ideas and show its effectiveness on both simulated and real data. |
| Researcher Affiliation | Academia | Zahra S. Razaee EMAIL Arash A. Amini EMAIL Jingyi Jessica Li EMAIL University of California, Los Angeles Department of Statistics 8125 Math Sciences Bldg., Box 951554 Los Angeles, CA 90095-1554, USA |
| Pseudocode | Yes | Algorithm 1 Variational block coordinate ascent for fitting mbi SBM Algorithm 2 Bipartite Spectral Clustering (bi SC) |
| Open Source Code | Yes | The code is available on Github (Razaee et al.). |
| Open Datasets | Yes | We have applied the algorithm to two wikipedia page user networks, which we will call Top Articles and Cities... Wikipedia usage statistics were scraped from Wikimedia Statistics using code inspired by Keegan (2014)... The first set was extracted from the Arnetminer collection, based on papers published from 1990 to 2005 in certain CS venues by Tang et al. (2012). The second set of data was scraped from DBLP... Wikimedia Statistics. URL https://stats.wikimedia.org. DBLP. Computer Science Bibliography. URL https://dblp.uni-trier.de. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits for model evaluation, rather it describes methods like subsampling for robustness analysis. For the simulated data, the 'true labels' are known. For real data, 'true communities' are used for NMI evaluation without specifying train/test splits. |
| Hardware Specification | No | The paper mentions: 'This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by UCLA Institute for Digital Research and Education s Research Technology Group.' However, it does not provide specific details such as CPU/GPU models, memory, or other hardware configurations. |
| Software Dependencies | No | The paper mentions the 'ggmap R package by Kahle and Wickham (2013)' and 'ipapi' for geo-data, and that 'Wikipedia usage statistics were scraped from Wikimedia Statistics using code inspired by Keegan (2014)'. However, it does not provide specific version numbers for these or any other key software components used in their methodology. |
| Experiment Setup | Yes | Key parameters regarding covariate generation in (2) are (µ, Σ) for generating v k. We take µ = 0 and Σ = νId1+d2 throughout. Varying ν (or dimensions dr) changes the information provided by the covariates (Appendix E). Larger ν causes v k to be further apart, hence covariates are more informative. ν = 0 corresponds to zero covariate information. We also fix covariate noise levels at σr = 0.5 for r = 1, 2, and the network size at N = (N1, N2) = (200, 800). Pick tolerance ε (0, 1]. |