Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Data-Adaptive Exposure Thresholds under Network Interference

Authors: Vydhourie Thiyageswaran, Tyler H. McCormick, Jennifer Brennan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present simulations illustrating that our method improves upon non-adaptive threshold choices, and an adapted Lepski s method. We further illustrate the performance of our estimator by running experiments with synthetic outcomes on a real village network dataset, and on a publicly-available Amazon product similarity graph. Furthermore, we demonstrate that our method remains robust to deviations from the linear potential outcomes model.
Researcher Affiliation	Collaboration	Vydhourie Thiyageswaran Department of Statistics University of Washington Seattle, WA, USA EMAIL Tyler H. Mc Cormick Department of Statistics University of Washington Seattle, WA, USA EMAIL Jennifer Brennan Google Research Kirkland, WA, USA EMAIL
Pseudocode	Yes	Algorithm 1 Ada Thresh Require: Graph adjacency matrix W, outcome vector Y , treatment vector z 1: Compute exposure: e D 1Wz 2: Fit linear model: Y = βz + γe + c 3: Let ˆγ be the estimated coefficient for e 4: for each threshold h H do 5: Estimate bias: d Bias(h) using ˆγ, Y, z, W, and h See Eq. (6) 6: Estimate variance: c Var(h) using Y , z, W, and h See Appendix A.1 7: Compute [ MSE(h) d Bias 2(h) + d Var(h) 8: end for 9: ˆh arg minh H [ MSE(h) 10: return ˆτˆh See (4)
Open Source Code	Yes	The code is available at: https://github.com/Vydhourie/AdaThresh.git
Open Datasets	Yes	We evaluate the performance of our estimator on village (No.6) network data from [Banerjee et al., 2013]... For larger n and smaller dmax, performance improves further, supporting our theoretical findings, as demonstrated in Appendix A.7 on the Amazon (DVD) products similarity network [Leskovec et al., 2007] (see Figure 6), and on various circulant graphs. ... The graph data is available at: https://snap.stanford.edu/data/amazon-meta.html
Dataset Splits	No	The paper does not explicitly describe training/test/validation dataset splits. It describes experimental setups with synthetic outcomes, randomizations (unit-level, cluster-level Bernoulli), and Monte-Carlo trials for exposure probabilities, but not a division of a single dataset into distinct training, validation, and test subsets.
Hardware Specification	Yes	All synthetic graph simulations were run on a machine of Intel Xeon processors with 48 CPU cores, and 50GB of RAM.
Software Dependencies	No	The paper mentions that code is available and describes the experimental setup but does not specify software dependencies with version numbers (e.g., Python, specific libraries, or solvers with versions).
Experiment Setup	Yes	We ran experiments with synthetic potential outcomes, averaging over 1000 trials, under unit-level and cluster-level Bernoulli randomizations... We generate simulated data using the linear model with ψ(zi, ei) = g(zi) + f(ei), αi = 10, g(zi) = βzi = 10zi, f(ei) = γei, with fixed ϵi generated from N(0, 1). To compute the exposure probabilities, we used 2 × 10^4 Monte-Carlo trials. We focus on varying the ratio γ/β as we consider a fixed graph.