Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Statistical Inference of Random Graphs With a Surrogate Likelihood Function

Authors: Dingbo Wu, Fangzheng Xie

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical performance of the proposed surrogate-likelihood-based methods is validated through the analyses of simulation examples and two real-world data sets.
Researcher Affiliation	Academia	Dingbo Wu EMAIL Department of Statistics Indiana University Bloomington, IN 47405, USA and Fangzheng Xie EMAIL Department of Statistics Indiana University Bloomington, IN 47405, USA
Pseudocode	Yes	We present the detailed stochastic gradient descent algorithm for computing the maximum surrogate likelihood estimator in Algorithm 1, the convergence of which is guaranteed by Theorem 9 below. Algorithm 1 Stochastic gradient descent for maximum surrogate likelihood estimation
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. It only mentions the license for the paper itself and that a detailed algorithm is provided in the supplementary material, which refers to the pseudocode.
Open Datasets	Yes	The network data is structured as follows: The vertices represent 1382 Wikipedia articles that are connected to the article named Algebraic geometry within two hyperlinks... The data set is publicly available at at http://www.cis.jhu.edu/~parky/Data/data.html. and We now consider the political blogs network (Adamic and Glance, 2005), a benchmark network data that has also been analyzed by Karrer and Newman (2011); Zhao et al. (2012); Amini et al. (2013); Jin (2015); Bickel and Sarkar (2015); Le et al. (2016).
Dataset Splits	No	The paper describes generating synthetic data and using real-world datasets, but it does not specify any training/test/validation splits, sample counts for splits, or cross-validation strategies applied to its own proposed methods. For real-world datasets, it applies clustering to the full estimated latent positions without detailing data partitioning for model training or evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to conduct the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using 'the R built-in optim function' and 'coda::heidel.diag() in R' but does not specify the version numbers for R or the 'coda' package, nor any other key software dependencies with their respective versions.
Experiment Setup	Yes	For the Bayes estimate, we use the uniform prior on the unit disk for all xi. The Metropolis Hastings sampler is implemented with parallelization over vertices i [n], and each Markov chain contains 1000 burn-in iterations and 2000 post-burn-in samples with a thinning of 5. The posterior mean is taken as the point estimate. and For the MSLE, we implement the step-halving stochastic gradient descent algorithm with the batch size set to s = 500 and s = n (giving rise to the classical gradient descent algorithm) to compare the computational costs.