Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Estimation of Treatment Effects in Extreme and Unobserved Data

Authors: Jiyuan Tan, Vasilis Syrgkanis, Jose Blanchet

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the performance of our estimator using both synthetic and semi-synthetic data. 4 Experiments Having established in Section 3 that under our regularity and overlap assumptions the DRand IPW-based extreme treatment estimators enjoy a provable non-asymptotic error bound, we next evaluate their finite-sample behavior and compare with our estimators with naive estimators that does not consider the regularly varying structure. In what follows, Section 4.1 presents purely synthetic simulations with known NETE. Section 4.2 then moves to a semi-synthetic setting using real noise from wavesurge datasets to assess practical performance under realistic complexities.
Researcher Affiliation	Academia	Jiyuan Tan Department of Management Science and Engineering Stanford University EMAIL Jose Blanchet Department of Management Science and Engineering Stanford University EMAIL Vasilis Syrgkanis Department of Management Science and Engineering Stanford University EMAIL
Pseudocode	Yes	Algorithm 1 Algorithm for NETE Estimation Require: Dataset D = {(Xi, Di, Yi, Ui)}n i=1, threshold t, exponent estimation bαn, estimator 1: Randomly split D into two equal parts D1 and D2 2: Using D1, estimate: a. Propensity function bp(x) via regression of D on X and clip the output of bp(x) to the interval [c, 1 c]. b. Pseudo-outcome regression bg(x, d, s) by regressing Y/ U bαn on (X, D, U/ U ) 3: Define index set I = {i : Ui > t, (Xi, Di, Yi, Ui) D2} and set Si = Ui/ Ui for i I 4: if estimator = IPW then 5: Compute bηIPW n,t = 1 bp(Xi) 1 Di 1 bp(Xi) 6: else if estimator = DR then 7: Compute bηDR n,t = 1 h bg(Xi, 1, Si) bg(Xi, 0, Si)+ Di bp(Xi) bp(Xi)(1 bp(Xi)) Yi/ Ui bαn bg(Xi, Di, Si) i . (3.6) 8: end if 9: Compute adaptive Hill estimator on { Ui : i I}: bγn = bγ(k) = 1 j=1 log U(j) U(k+1) , bµn = 1 1 bαnbγn , (3.7) where U(1) U(k+1) and k is chosen by k = max k {ln, , n} and i {ln, , n}, \|bγ(i) bγ(k)\| bγ(i)rn(δ) 10: return bθestimator n,t = bηestimator n,t bµn.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code will be provided in the camera-ready version.
Open Datasets	Yes	Now, we use the wavesurge dataset Coles et al. [2001] to create a semi-synthetic dataset for our experiments. The wavesurge dataset has 2894 data points, which contain wave and surge heights at a single location off south-west England.
Dataset Splits	Yes	We split the dataset into a training set (1,000 observations) and a test set (1,894 observations).
Hardware Specification	No	Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: All experiments can be easily reproduced using a laptop.
Software Dependencies	No	We run logistic regression to estimate the propensity function and use random forest to model the outcome. For the adaptive Hill estimator Boucheron and Thomas [2015], we follow authors choice for hyperparameters and choose ln = 30, r(δ) = p log log(n) and k = min k {ln, , n} and i {ln, , n}, \|bγ(i) bγ(k)\| > bγ(i)rn(δ) where bγ(i) = 1 i Pi j=1 log U(j) U(i+1) .
Experiment Setup	Yes	We clip the propensity to [10 4, 1 10 4] to ensure the overlap assumption (Assumption 2.2). ϵ Unif([ 1, 1]) in the data generation in synthetic experiments. We use sample splitting in our experiment, using the first half for nuisance estimation. In the experiment, we use the same threshold t for all estimators, which is given by Corollary 3.9. To choose the threshold, we first use the adaptive Hill estimator Boucheron and Thomas [2015] to get an estimation of EVI bγn and then set the threshold to be t = 0.25n(bγn/(1+2 min{1,bγn}) as in Theorem 3.8. The approximate exponential bαn is coefficient of log( U ) in linear regression log(\|Y \|) log( U ). For the adaptive Hill estimator Boucheron and Thomas [2015], we follow authors choice for hyperparameters and choose ln = 30, r(δ) = p log log(n) and k = min k {ln, , n} and i {ln, , n}, \|bγ(i) bγ(k)\| > bγ(i)rn(δ) where bγ(i) = 1 i Pi j=1 log U(j) U(i+1) .