Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings

Authors: Houssam Zenati, Bariscan Bozkurt, Arthur Gretton

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical simulations illustrate the practical benefits of CPME over existing methods. ...Section 6 reports numerical results...In this section, we present numerical simulations for testing and sampling from the counterfactual distributions. Full experimental details, including additional simulations, are provided in Appendix 14. All code and simulation materials used in this study are publicly available at https://github.com/ houssamzenati/counterfactual-policy-mean-embedding.
Researcher Affiliation Collaboration Houssam Zenati Gatsby Computational Neuroscience Unit University College London EMAIL Bariscan Bozkurt Gatsby Computational Neuroscience Unit University College London EMAIL Arthur Gretton Gatsby Computational Neuroscience Unit University College London Google Deepmind EMAIL
Pseudocode Yes Algorithm 1 DR-KPT Require: Data D = (xi, ai, yi)n i=1, kernels k Y, k A,X Ensure: The p-value of the test 1: Set m = n/2 and estimate ˆµY |A,X, ˆπ0 on first m samples, µY |A,X, π0 on remaining n m. 2: Define ˆφ(y, a, x) = n π(a|x) ˆπ0(a|x) π (a|x) ˆπ0(a|x) o ϕY(y) ˆµY |A,X(a, x) + ˆβπ(x) ˆβπ (x) and φ. 3: Define f π,π (yi, ai, xi) = 1 n m Pn j=m+1 ˆφ (yi, ai, xi) , φ (yj, aj, xj) for i = 1, . . . , m 4: Calculate f π,π and S π,π using Equation (14), then T π,π = S π,π 5: return p-value p = 1 Φ(T π,π ) ...Algorithm 2 Sampling from the counterfactual distribution...Algorithm 3 Plug-in estimator of the CPME (Discrete actions)...Algorithm 4 Plug-in estimator of the CPME
Open Source Code Yes All code and simulation materials used in this study are publicly available at https://github.com/ houssamzenati/counterfactual-policy-mean-embedding. ...All the code to reproduce our numerical simulations is provided in the supplementary material and will be open-sourced upon acceptance of the manuscript.
Open Datasets Yes Warfarin dataset We use the publicly available dataset on Warfarin dosage [68], which contains patient covariates and expert-prescribed therapeutic doses. ...d Sprites (Structured Outcomes). We perform experiments on the d Sprites dataset [70, 71], which enables evaluation on structured image outcomes.
Dataset Splits Yes For DR-KPT the regularization parameter λ is selected via 3-fold cross-validation in the range {10 4, . . . , 100} ...Each estimator is tuned by 5-fold cross-validation procedure for OPE setting introduced in [17, Appendix B]
Hardware Specification Yes Operating System: Linux (kernel version 6.8.0-55-generic) GPU: NVIDIA RTX A4500 Driver Version: 560.35.05 CUDA Version: 12.6 Memory: 20 GB GDDR6
Software Dependencies Yes Driver Version: 560.35.05 CUDA Version: 12.6
Experiment Setup Yes For DR-KPT the regularization parameter λ is selected via 3-fold cross-validation in the range {10 4, . . . , 100}, as done in [17]. We use the median heuristic for the lengthscales of the kernel k A, k X and k Y. ...We set the covariate dimension to d = 5, γ = 1 and evaluate β in the grid β = [0.1, 0.2, 0.3, 0.4, 0.5]...The results in Table 1 show that DR-KPT is well-calibrated under the null (Scenario I) with near-nominal rejection rates. ...For the DM and DR-NN models, we vary the number of hidden units nh 50, 100, 150, 200. For CPME and DR-CPME, the regularization parameter λ is selected from the range {10 8, . . . , 10 3}.