Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Treatment Effect Estimation for Optimal Decision-Making

Authors: Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal, Mihaela van der Schaar, Stefan Feuerriegel

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we confirm the effectiveness of our method both empirically and theoretically. (Abstract) ... We now confirm the effectiveness of our proposed learning algorithm empirically. As is standard in causal inference [50, 9, 33], we use data where we have access to ground-truth values of causal quantities. We also provide experimental results using real-world data. (Section 5, Experiments)
Researcher Affiliation	Academia	Dennis Frauen LMU Munich Munich Center for Machine Learning Valentyn Melnychuk LMU Munich Munich Center for Machine Learning Jonas Schweisthal LMU Munich Munich Center for Machine Learning Mihaela van der Schaar University of Cambridge Stefan Feuerriegel LMU Munich Munich Center for Machine Learning
Pseudocode	Yes	Algorithm 1: Re-targeted CATE estimation (PT-CATE) 1: Input: Training data {(xi, ai, yi)}n i=1, pseudo-outcome type m, trade-off γ [0, 1], learning rates ηg, ηα, epochs E1, E2, E3, iterations K. 2: Stage 1: Estimate nuisance functions ˆη = (ˆµ1, ˆµ0, ˆπb); compute pseudo-outcomes {ym ˆη,i}. 3: Stage 2: Initialize parameters θ (for g) and ϕ (for α). 4: for epoch = 1, . . . , E1 do 5: θ θ ηg θ ˆLm 0,αϕ,ˆη(gθ) {Step 1} 6: end for 7: for iter = 1, . . . , K do 8: for epoch = 1, . . . , E2 do 9: ϕ ϕ ηα ϕ ˆLm γ,gθ,ˆη(αϕ) {Step 2} 10: end for 11: for epoch = 1, . . . , E3 do 12: θ θ ηg θ ˆLm γ,αϕ,ˆη(gθ) {Step 3} 13: end for 14: end for 15: Output: gθ and αϕ.
Open Source Code	Yes	Code is available at https://github.com/DennisFrauen/CATEForPolicy. (Footnote 2) ...Code is provided. (Section 5, Evaluation) ...Code is available at https://github.com/Dennis Frauen/CATEFor Policy. (Footnote 3)
Open Datasets	Yes	Real-world data. Dataset. Here, we provide additional experimental results using the Hillstrom Email Marketing dataset of n = 64000 customers. Details regarding the dataset and our preprocessing are in Appendix E. (Section 5) ... The data is taken from https://causeinfer.readthedocs.io/en/latest/data/hillstrom.html. (Appendix E)
Dataset Splits	Yes	We split the data into a training dataset with 50% of the data, a validation set with 20%, and a test set with 30% of the data. (Appendix E)
Hardware Specification	Yes	Runtime. For the second-stage models, training took approximately two minutes using n = 2000 samples and a standard computer with AMD Ryzen 7 Pro CPU and 32GB of RAM. (Appendix C)
Software Dependencies	No	The paper mentions using "standard feed-forward neural networks" and the "Adam optimizer", but does not provide specific version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Implementation details. We use standard feed-forward neural networks with tanh activations for gθ and with Re LU activations for αϕ. We use ρ(x) + a as the final activation function for αϕ to ensure αϕ(x) > a, where ρ(x) denotes the softplus function. We perform training using the Adam optimizer [36]. Further details regarding architecture, training, and hyperparameters are in Appendix C. (Section 5) ... Hyperparameters. To ensure a fair comparison, we use the same hyperparameters for each second-stage learner across different γ and random seeds. For reproducibility purposes, we report the hyperparameters used (e.g., dimensions, learning rate) for all experiments and models (including nuisance functions) as .yaml files. (Appendix C)