Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploiting Discovered Regression Discontinuities to Debias Conditioned-on-observable Estimators

Authors: Benjamin Jakubowski, Sriram Somanchi, Edward McFowland III, Daniel B. Neill

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of our DEE method, we present results on two synthetic datasets, then apply DEE to a recent problem from the economic development literature (Asher and Novosad, 2020). Within each simulation, we compare our method s performance varying the prior variances in τ(x) and in β(x).
Researcher Affiliation	Academia	Benjamin Jakubowski EMAIL Machine Learning for Good Laboratory New York University Brooklyn, NY 11201, USA Sriram Somanchi EMAIL IT, Analytics, and Operations University of Notre Dame South Bend, IN 46556, USA Edward Mc Fowland III EMAIL Technology, Operations, and Management Harvard University Boston, MA 02163, USA Daniel B. Neill EMAIL Machine Learning for Good Laboratory New York University Brooklyn, NY 11201, USA
Pseudocode	Yes	Algorithm 1: Voronoi KNN repair procedure used by DEE. Algorithm 2: Lo RD3 procedure, used by DEE, to automatically discover local regression discontinuities.
Open Source Code	Yes	Our code for DEE is available at https://github.com/ssomanch/DEE.
Open Datasets	Yes	We use a subset of rural villages from the full replication dataset of Asher and Novosad (2020) provided on Open ICPSR.
Dataset Splits	No	The paper uses two synthetic datasets (generated samples) and a real-world dataset. For the synthetic datasets, it states, "For each parameter conﬁguration in the 2 × 2 parameter grid (θτ {0.2, 0.5}, θβ {0.2, 0.5}), we draw N = 20, 000 samples from this DGP" and "sample a training set D with \|D\| = N = 20, 000 instances." For evaluation, it mentions, "Figure 3 averages (τ(x) − µ(x))2 over a 100 × 100 mesh grid uniformly covering [0, 1]2." For the real-world dataset, it states, "we include all 35,273 villages with populations between 300 and 1,300 that satisﬁed these initial criteria". While data generation and evaluation are mentioned, there are no explicit training/test/validation splits described for either the synthetic or real-world datasets.
Hardware Specification	No	The paper mentions "GPU acceleration" in the context of the gpytorch library for Gaussian Processes but does not specify any particular GPU models, CPU models, memory configurations, or other specific hardware used for running the experiments.
Software Dependencies	No	The paper mentions "gpytorch (Gardner et al., 2018)" as a tool used for Gaussian Process fitting. However, it does not specify a version number for gpytorch or any other software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	RD Discovery: Lo RD3 was applied with k = 200 nearest neighbors, and a degree-4 polynomial baseline treatment propensity model. We selected the M = \|L\| = 400 points with the highest LLR. CATE Estimation: The CATE was estimated using Voronoi KNN repair (Algorithm 1) with parameters k = 1000 and t = 30. Extrapolation: As an observational estimator, we ﬁt a causal forest (Wager and Athey, 2018). To extrapolate, we use Gaussian Processes ﬁt via marginal likelihood maximization using gpytorch (Gardner et al., 2018), with (i) constant mean parameters, and (ii) isotropic Gaussian (RBF) kernels parameterized by an output scale and length scale.