Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exploiting Discovered Regression Discontinuities to Debias Conditioned-on-observable Estimators

Authors: Benjamin Jakubowski, Sriram Somanchi, Edward McFowland III, Daniel B. Neill

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the performance of our DEE method, we present results on two synthetic datasets, then apply DEE to a recent problem from the economic development literature (Asher and Novosad, 2020). Within each simulation, we compare our method s performance varying the prior variances in τ(x) and in β(x).
Researcher Affiliation Academia Benjamin Jakubowski EMAIL Machine Learning for Good Laboratory New York University Brooklyn, NY 11201, USA Sriram Somanchi EMAIL IT, Analytics, and Operations University of Notre Dame South Bend, IN 46556, USA Edward Mc Fowland III EMAIL Technology, Operations, and Management Harvard University Boston, MA 02163, USA Daniel B. Neill EMAIL Machine Learning for Good Laboratory New York University Brooklyn, NY 11201, USA
Pseudocode Yes Algorithm 1: Voronoi KNN repair procedure used by DEE. Algorithm 2: Lo RD3 procedure, used by DEE, to automatically discover local regression discontinuities.
Open Source Code Yes Our code for DEE is available at https://github.com/ssomanch/DEE.
Open Datasets Yes We use a subset of rural villages from the full replication dataset of Asher and Novosad (2020) provided on Open ICPSR.
Dataset Splits No The paper uses two synthetic datasets (generated samples) and a real-world dataset. For the synthetic datasets, it states, "For each parameter configuration in the 2 × 2 parameter grid (θτ {0.2, 0.5}, θβ {0.2, 0.5}), we draw N = 20, 000 samples from this DGP" and "sample a training set D with |D| = N = 20, 000 instances." For evaluation, it mentions, "Figure 3 averages (τ(x) − µ(x))2 over a 100 × 100 mesh grid uniformly covering [0, 1]2." For the real-world dataset, it states, "we include all 35,273 villages with populations between 300 and 1,300 that satisfied these initial criteria". While data generation and evaluation are mentioned, there are no explicit training/test/validation splits described for either the synthetic or real-world datasets.
Hardware Specification No The paper mentions "GPU acceleration" in the context of the gpytorch library for Gaussian Processes but does not specify any particular GPU models, CPU models, memory configurations, or other specific hardware used for running the experiments.
Software Dependencies No The paper mentions "gpytorch (Gardner et al., 2018)" as a tool used for Gaussian Process fitting. However, it does not specify a version number for gpytorch or any other software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes RD Discovery: Lo RD3 was applied with k = 200 nearest neighbors, and a degree-4 polynomial baseline treatment propensity model. We selected the M = |L| = 400 points with the highest LLR. CATE Estimation: The CATE was estimated using Voronoi KNN repair (Algorithm 1) with parameters k = 1000 and t = 30. Extrapolation: As an observational estimator, we fit a causal forest (Wager and Athey, 2018). To extrapolate, we use Gaussian Processes fit via marginal likelihood maximization using gpytorch (Gardner et al., 2018), with (i) constant mean parameters, and (ii) isotropic Gaussian (RBF) kernels parameterized by an output scale and length scale.