Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Treatment Effect Estimation for Optimal Decision-Making
Authors: Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal, Mihaela van der Schaar, Stefan Feuerriegel
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we confirm the effectiveness of our method both empirically and theoretically. (Abstract) ... We now confirm the effectiveness of our proposed learning algorithm empirically. As is standard in causal inference [50, 9, 33], we use data where we have access to ground-truth values of causal quantities. We also provide experimental results using real-world data. (Section 5, Experiments) |
| Researcher Affiliation | Academia | Dennis Frauen LMU Munich Munich Center for Machine Learning Valentyn Melnychuk LMU Munich Munich Center for Machine Learning Jonas Schweisthal LMU Munich Munich Center for Machine Learning Mihaela van der Schaar University of Cambridge Stefan Feuerriegel LMU Munich Munich Center for Machine Learning |
| Pseudocode | Yes | Algorithm 1: Re-targeted CATE estimation (PT-CATE) 1: Input: Training data {(xi, ai, yi)}n i=1, pseudo-outcome type m, trade-off γ [0, 1], learning rates ηg, ηα, epochs E1, E2, E3, iterations K. 2: Stage 1: Estimate nuisance functions ˆη = (ˆµ1, ˆµ0, ˆπb); compute pseudo-outcomes {ym ˆη,i}. 3: Stage 2: Initialize parameters θ (for g) and ϕ (for α). 4: for epoch = 1, . . . , E1 do 5: θ θ ηg θ ˆLm 0,αϕ,ˆη(gθ) {Step 1} 6: end for 7: for iter = 1, . . . , K do 8: for epoch = 1, . . . , E2 do 9: ϕ ϕ ηα ϕ ˆLm γ,gθ,ˆη(αϕ) {Step 2} 10: end for 11: for epoch = 1, . . . , E3 do 12: θ θ ηg θ ˆLm γ,αϕ,ˆη(gθ) {Step 3} 13: end for 14: end for 15: Output: gθ and αϕ. |
| Open Source Code | Yes | Code is available at https://github.com/DennisFrauen/CATEForPolicy. (Footnote 2) ...Code is provided. (Section 5, Evaluation) ...Code is available at https://github.com/Dennis Frauen/CATEFor Policy. (Footnote 3) |
| Open Datasets | Yes | Real-world data. Dataset. Here, we provide additional experimental results using the Hillstrom Email Marketing dataset of n = 64000 customers. Details regarding the dataset and our preprocessing are in Appendix E. (Section 5) ... The data is taken from https://causeinfer.readthedocs.io/en/latest/data/hillstrom.html. (Appendix E) |
| Dataset Splits | Yes | We split the data into a training dataset with 50% of the data, a validation set with 20%, and a test set with 30% of the data. (Appendix E) |
| Hardware Specification | Yes | Runtime. For the second-stage models, training took approximately two minutes using n = 2000 samples and a standard computer with AMD Ryzen 7 Pro CPU and 32GB of RAM. (Appendix C) |
| Software Dependencies | No | The paper mentions using "standard feed-forward neural networks" and the "Adam optimizer", but does not provide specific version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Implementation details. We use standard feed-forward neural networks with tanh activations for gθ and with Re LU activations for αϕ. We use ρ(x) + a as the final activation function for αϕ to ensure αϕ(x) > a, where ρ(x) denotes the softplus function. We perform training using the Adam optimizer [36]. Further details regarding architecture, training, and hyperparameters are in Appendix C. (Section 5) ... Hyperparameters. To ensure a fair comparison, we use the same hyperparameters for each second-stage learner across different γ and random seeds. For reproducibility purposes, we report the hyperparameters used (e.g., dimensions, learning rate) for all experiments and models (including nuisance functions) as .yaml files. (Appendix C) |