Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Density Ratio-Free Doubly Robust Proxy Causal Learning

Authors: Bariscan Bozkurt, Houssam Zenati, Dimitri Meunier, Liyuan Xu, Arthur Gretton

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we assess the performance of our proposed estimators for dose-response curve estimation using both synthetic and real-world datasets. We benchmark our methods against several recent state-of-the-art PCL algorithms, including Proximal Kernel Doubly Robust (PKDR) [22], Kernel Negative Control (KNC) [12], Kernel Proxy Variable (KPV) [11], Proximal Maximum Moment Restriction (PMMR) [11], and Kernel Alternative Proxy (KAP) [23]. Except for experiments involving PKDR, we use a Gaussian kernel of the form k F(fi, fj) = exp( fi fj 2 2/(2l2)) for each F {W, Z, A}, where l denotes the kernel bandwidth. The bandwidth is selected using the median heuristic based on pairwise distances. For PKDR, we follow the original implementation by Wu et al. [22] and use the Epanechnikov kernel. We determine the regularization parameter λDR by utilizing the closed-form expression for leave-one-out cross-validation (LOOCV) in kernel ridge regression. Either LOOCV or a held-out validation set are applied for the regularization terms in the treatment and outcome bridge methods, in line with prior approaches in [11, 12, 23]. We provide additional experimental details including ablation studies on hyperparameter selection, and scalability analysis with Nyström approximation in the S.M. (F). Synthetic Low Dimensional: We adopt the synthetic data generation process from Wu et al. [22] which simulates a confounded, nonlinear, and noisy treatment outcome relationship: ... Figure (2a) presents the mean squared error (MSE) of different PCL benchmark methods, averaged over 30 independent runs.
Researcher Affiliation	Collaboration	1Gatsby Computational Neuroscience Unit, University College London, 2Secondmind, 3Deep Mind. EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 DRKPV / DRPMMR Algorithms Algorithm 2 Kernel Proxy Variable Algorithm [11, 62] Algorithm 3 Proximal Maximum Moment Restriction Algorithm [11, 62] Algorithm 4 Alternative Kernel Proxy Variable Algorithm [23] Algorithm 5 Doubly Robust Kernel Proxy Variable Algorithm (Replication of Algorithm (1) in Section (3.2)).
Open Source Code	Yes	Our implementation code is available on Git Hub1. 1https://github.com/Bariscan Bozkurt/Doubly-Robust-Kernel-Proxy-Variable-Algorithm
Open Datasets	Yes	d Sprite: We use the Disentanglement testing Sprite dataset (d Sprite) dataset [49] Legalized Abortion and Crime: We evaluate our methods on the Legalized Abortion and Crime dataset [51], following the preprocessing and setup from [11, 22, 23, 52]. We use the version of the dataset available from the Git Hub repository of Mastouri et al. [11]2. 2https://github.com/yuchen-zhu/kernel_proxies Grade Retention: We evaluate the effect of grade retention on long-term cognitive development using data from the ECLS-K panel study [10, 53], following the setup of Mastouri et al. [11]2. Numerical experiments with Job Corps dataset [54, 55], which we accessed through the public repository provided by Singh et al. [69] (https://github.com/liyuan9988/Kernel Causal Function/tree/master).
Dataset Splits	Yes	Table (F.2) (Data splitting for KPV and KAP). KPV (Stage 1) { wi, zi, ai}nh i=1 Uses nh = t/2 random samples. KPV (Stage 2) { yi, zi, ai}mh i=1 Uses mh = t nh. Samples are disjoint from Stage 1. KPV (Stage 3) { wi}th i=1 Uses the full set: th = t. KAP (Stage 1) { wi, zi, ai}nφ i=1 Uses nφ = t/2 random samples. KAP (Stage 2) { wi, ai}mφ i=1 Uses mφ = t nφ. Samples are disjoint from Stage 1. KAP (Stage 3) { yi, zi, ai}tφ i=1 Uses the full set: tφ = t.
Hardware Specification	No	We did not report compute resources, as our methods are lightweight and all experiments can be run on a standard personal computer with Python on it.
Software Dependencies	No	We include our Python code with the supplementary zip file, along with instructions to reproduce and required packages.
Experiment Setup	Yes	We determine the regularization parameter λDR by utilizing the closed-form expression for leave-one-out cross-validation (LOOCV) in kernel ridge regression. Either LOOCV or a held-out validation set are applied for the regularization terms in the treatment and outcome bridge methods, in line with prior approaches in [11, 12, 23]. For λDR, we use a grid search over 25 logarithmically spaced values in the range [5 10 5, 1] to minimize the LOOCV loss. In our experiments, we used a grid of 25 logarithmically spaced values in the range [5 10 5, 1]. (S.M. F.3.2 for KPV second-stage regression) In our experiments, we used a grid of 25 logarithmically spaced values in the range [5 10 5, 1] to tune λφ,2. Following the complexity regularization parameters in Bozkurt et al. [23], we set σφ = 1 or the synthetic low-dimensional setting as well as the legalized abortion and crime dataset, and σφ = 3 for d Sprite and grade retention datasets. (S.M. F.3.2 for KAP second-stage regression) We use a grid of 25 logarithmically spaced values in the range [5 10 5, 10 3] to tune λMMR, with 10% of the training set held out as a validation set. (S.M. F.3.2 for PMMR regularization parameter tuning) kernel bandwidth. The bandwidth is selected using the median heuristic based on pairwise distances.