Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Statistical Inference of Random Graphs With a Surrogate Likelihood Function
Authors: Dingbo Wu, Fangzheng Xie
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical performance of the proposed surrogate-likelihood-based methods is validated through the analyses of simulation examples and two real-world data sets. |
| Researcher Affiliation | Academia | Dingbo Wu EMAIL Department of Statistics Indiana University Bloomington, IN 47405, USA and Fangzheng Xie EMAIL Department of Statistics Indiana University Bloomington, IN 47405, USA |
| Pseudocode | Yes | We present the detailed stochastic gradient descent algorithm for computing the maximum surrogate likelihood estimator in Algorithm 1, the convergence of which is guaranteed by Theorem 9 below. Algorithm 1 Stochastic gradient descent for maximum surrogate likelihood estimation |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. It only mentions the license for the paper itself and that a detailed algorithm is provided in the supplementary material, which refers to the pseudocode. |
| Open Datasets | Yes | The network data is structured as follows: The vertices represent 1382 Wikipedia articles that are connected to the article named Algebraic geometry within two hyperlinks... The data set is publicly available at at http://www.cis.jhu.edu/~parky/Data/data.html. and We now consider the political blogs network (Adamic and Glance, 2005), a benchmark network data that has also been analyzed by Karrer and Newman (2011); Zhao et al. (2012); Amini et al. (2013); Jin (2015); Bickel and Sarkar (2015); Le et al. (2016). |
| Dataset Splits | No | The paper describes generating synthetic data and using real-world datasets, but it does not specify any training/test/validation splits, sample counts for splits, or cross-validation strategies applied to its own proposed methods. For real-world datasets, it applies clustering to the full estimated latent positions without detailing data partitioning for model training or evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to conduct the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'the R built-in optim function' and 'coda::heidel.diag() in R' but does not specify the version numbers for R or the 'coda' package, nor any other key software dependencies with their respective versions. |
| Experiment Setup | Yes | For the Bayes estimate, we use the uniform prior on the unit disk for all xi. The Metropolis Hastings sampler is implemented with parallelization over vertices i [n], and each Markov chain contains 1000 burn-in iterations and 2000 post-burn-in samples with a thinning of 5. The posterior mean is taken as the point estimate. and For the MSLE, we implement the step-halving stochastic gradient descent algorithm with the batch size set to s = 500 and s = n (giving rise to the classical gradient descent algorithm) to compare the computational costs. |