Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Inference for Network Regression Models with Community Structure

Authors: Mengjie Pan, Tyler Mccormick, Bailey Fosdick

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of our proposed blockexchangeable error model, we generate data from a modiﬁed latent space model (Hoff, 2005), which satisﬁes the requirements for block exchangeability. We consider a simple regression model with one covariate, as in (3) where both coefﬁcients equal 1. We consider three settings for the relationship between the covariate and block structure, and three types of covariates. Figure 3 shows the coverage of 95% conﬁdence intervals for β1 for all nine simulation settings. ... We demonstrate our method on data representing passenger volume between US airports (Bureau of Transportation Statistics, 2016).
Researcher Affiliation	Collaboration	1Facebook, Seattle, Washington, USA 2Department of Statistics and Department of Sociology, University of Washington, Seattle, Washington, USA 3Department of Statistics, Colorado State University, Fort Collins, Colorado, USA.
Pseudocode	Yes	Algorithm 1 Known block estimation of ΩB... Algorithm 2 Block membership estimation
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We demonstrate our method on data representing passenger volume between US airports (Bureau of Transportation Statistics, 2016). The number of passenger seats is a right-tailed skewed distribution, so we use the values yij = log(yij + 1) as the relational observations for regression model in (1). For covariates, we calculated the great circle distance between two airports using their longitudes and latitudes. Additionally, we identiﬁed the county of the municipality of each airport, and found the total GDP of that county from of Economic Analysis (2015) and average payroll of an employed person from Bureau (2015).
Dataset Splits	No	For the simulation studies, the paper describes data generation ('we generate data from a modiﬁed latent space model... We consider networks of size 20, 40, 80, and 160') rather than specific train/validation/test splits of a pre-existing dataset. For the Air Traffic Data, it does not explicitly provide details about dataset splits for training or validation.
Hardware Specification	No	The paper mentions computational time on a 'standard machine' and 'standard laptop' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions using 'optim in R' but does not specify the version numbers for R or the 'optim' package/function.
Experiment Setup	Yes	We consider a simple regression model with one covariate, as in (3) where both coefﬁcients equal 1. We generated 1000 errors for each of 500 simulations of the covariates and block memberships, and considered networks of size 20, 40, 80, and 160. ... To numerically optimize the pseudo-likelihood, we used optim in R, with method="L-BFGS-B". We do not set bounds on β, but did place a lower bound of 1e 2 for all variance parameters and a bound of [ 0.9, 0.9] for all correlation parameters.