Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scaling Epidemic Inference on Contact Networks: Theory and Algorithms

Authors: Guanghui Min, Yinhan He, Chen Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct extensive experiments on six real-world datasets, demonstrating our method s effectiveness and robustness in estimating the nodes final state distribution. Specifically, our proposed method consistently produces accurate estimates aligned with results from a large number of MC simulations, while maintaining a runtime comparable to a single MC simulation.
Researcher Affiliation Academia Guanghui Min Yinhan He Chen Chen University of Virginia EMAIL
Pseudocode Yes Algorithm 1 Sketch of PID Algorithm 2 Sketch of RAPID Algorithm 1: Monte Carlo Simulation for SIR Algorithm 2: PID: Probabilistic Infection Dynamics Algorithm 3: RAPID: Residual-Accelerated Propagation for Infection Dynamics
Open Source Code Yes Our code and datasets are available at https://github.com/Guanghui Min/RAPID.
Open Datasets Yes Our code and datasets are available at https://github.com/Guanghui Min/RAPID. We use graphs from diverse domains, including a real-world hospital contact network (carilion-Hospital [1]), a real-world HIV transmission network (hiv-Trans [40]), communication networks (email-Enron [35, 29], email-Eu All [34]), and social networks (soc-Epinions [52], soc-Pokec [54]).
Dataset Splits No The paper does not explicitly provide traditional training/test/validation dataset splits. Instead, it describes estimating ground truth infection probabilities by running a large number of Monte Carlo simulations (e.g., 50-run or 1000-run MC simulations) over entire datasets, rather than splitting the datasets themselves for distinct training and evaluation phases of a model.
Hardware Specification Yes All experiments were conducted on a machine equipped with a Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz processor with 376 GB Memory.
Software Dependencies No All algorithms are implemented in Python with the Networkx library. No specific version numbers for Python or the Networkx library are provided, which are necessary for reproducible software dependencies.
Experiment Setup Yes For RAPID, the number of preheat steps p is set to 20, and the propagation residual threshold is set to 10 3. ... For reporting and baseline comparisons, we adopt the parameter setting of Nipah Virus (β = 1/18, γ = 1/9) with an initial infection fraction of α = 0.01.