Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Beyond the Average: Distributional Causal Inference under Imperfect Compliance

Authors: Undral Byambadalai, Tomu Hirata, Tatsushi Oka, Shota Yasui

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results demonstrate favorable finite-sample performance, and we demonstrate the method s practical relevance in an application to the Oregon Health Insurance Experiment. ... 6 Experiments 6.1 Simulation Study 6.2 Real Data Analysis: Oregon Health Insurance Experiment
Researcher Affiliation	Collaboration	Undral Byambadalai Cyber Agent, Inc., Tokyo, Japan EMAIL Tomu Hirata Databricks Japan, Inc., Tokyo, Japan EMAIL Tatsushi Oka Keio University Tokyo, Japan EMAIL Shota Yasui Cyber Agent, Inc., Tokyo, Japan EMAIL
Pseudocode	Yes	Algorithm 1 ML Regression-Adjusted LDTE Estimator with Cross-Fitting
Open Source Code	Yes	The code is publicly available at https://github.com/Cyber Agent AILab/ldte, and the method can be implemented using the Python library dte-adj (https://pypi.org/project/dte-adj/).
Open Datasets	Yes	We validate our approach through simulation studies and an empirical application to the Oregon Health Insurance Experiment... The dataset is publicly available at https://www.nber.org/research/data/oregon-health-insurance-experimentdata.
Dataset Splits	Yes	All adjusted estimators use 2-fold cross-fitting. ... For regression adjustment, we use gradient boosting with 5-fold cross-fitting, with 28 pre-treatment covariates (Xi)
Hardware Specification	Yes	All experiments are run on a Macbook Pro with 36 GB memory and the Apple M3 Pro chip.
Software Dependencies	No	The paper mentions 'Python library dte-adj' but does not specify a version number for this library or for Python itself. This is insufficient to meet the requirement of specific version numbers for key software components.
Experiment Setup	Yes	The data generating process consists of four strata (S = 4) constructed by partitioning the support of a covariate Wi U(0, 1) into S equal-length intervals, where Si indicates the interval containing Wi. For each unit i, we draw an additional 20-dimensional covariate vector Xi = (X1,i, . . ., X20,i) from a multivariate normal distribution N(0, I20 20). The treatment indicator Zi follows a Bernoulli distribution with probability 0.5 within each stratum... We draw a sample of sizes {500, 1000, 5000} from the data-generating process and estimate the LDTE at quantiles {0.1, ..., 0.9} using three methods with 1000 simulations: an unadjusted estimator, a linear regression-adjusted estimator, and a machine learning-adjusted estimator based on gradient boosting. ... For regression adjustment, we use gradient boosting with 5-fold cross-fitting, with 28 pre-treatment covariates (Xi) including various variables regarding past emergency department visits.