Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect

Authors: Ruiyang Lin, Yongyi Guo, Kyra Gan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using synthetic and real-world data, we validate our theory numerically, showing that the proposed optimal valid adjustment set yields the lowest variance at practical sample sizes. Our results offer a principled framework for efficient estimation of direct effects in complex causal systems, with practical applications in fairness and mediation analysis.
Researcher Affiliation	Academia	Ruiyang Lin The University of Science and Technology of China EMAIL Yongyi Guo University of Wisconsin Madison EMAIL Kyra Gan Cornell Tech, Cornell University EMAIL
Pseudocode	No	The paper describes the AIPW estimator with formulas in Section 5 and discusses the data-generating process in Appendix D.1 using structural equations. However, it does not present a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. The description is textual and mathematical rather than algorithmic.
Open Source Code	Yes	4Code available at https://github.com/Lin-Ruiyang/WCDE-Simulation
Open Datasets	Yes	Real-World Experiments We evaluate our method on three widely used semi-synthetic Bayesian networks from bnlearn: ASIA, SIGNALING, and MILDEW, which serve as standard benchmarks for causal inference due to their realistic structures and domain relevance [24, 27, 47].
Dataset Splits	No	For each dataset, we validated our theoretical results through a five-step process: (1) selecting a treatment and outcome variable (...); (4) simulating data from the underlying causal model across sample sizes n = 250, 500, 1000, 4000, and 10000; and (5) computing the empirical variance and MSE for each adjustment set. The paper describes generating data for different sample sizes for evaluation, but does not specify explicit training/test/validation splits of a fixed dataset.
Hardware Specification	Yes	All simulations were executed on AWS EC2 c7a.24xlarge instances (96 v CPUs, 192 GB RAM) using Ray for parallelization with deterministic single-thread workers.
Software Dependencies	No	The paper mentions methods like "linear regressions with spline-transformed features (degree 5, 10 knots)", "multinomial logistic regression", "logistic regressions", and "Gaussian kernel density estimation", as well as using "Ray for parallelization". However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	In our experiments, we implemented the AIPW estimator, constructed using the plug-in components of the influence function: To estimate the conditional expectations ˆµ(a, Z1, Z2), we fit linear regressions with spline-transformed features (degree 5, 10 knots) for continuous outcomes, and use multinomial logistic regression for discrete outcomes. The treatment probabilities ˆp(A = a \| Z1, Z2) are obtained from logistic regressions. The marginal and joint densities are estimated nonparametrically, using empirical frequencies for discrete variables and Gaussian kernel density estimation for continuous ones. (...) For each adjustment set, we report average variance and average MSE across 50 replications. (...) data are generated from a nonlinear structural equation model with random edge coefficients, nonlinear transformations, and additive Gaussian noise. Appendix D.1 contains full details of the data-generating process and estimator implementation. (...) coefficients are sampled independently: with 50% probability from Uniform[ 1.5, 0.5], otherwise from Uniform[0.5, 1.5] (...) fj( ) is applied independently to each parent variable Xj before combination and is selected uniformly at random from: Identity: f(x) = x, Sine: f(x) = sin(x), Cosine: f(x) = cos(x). (...) The additive noise term εi is sampled independently from N(0, 1/4) for each node with parents. Root nodes (i.e., nodes with no parents) are sampled from N(0, 1).