Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

Authors: Dennis Frauen, Konstantin Hess, Stefan Feuerriegel

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our IVW-DR-learner achieves superior performance in our experiments, particularly in regimes with low overlap and long time horizons. ... In this section, we compare our proposed meta-learners empirically. ... We simulate three datasets Dj with j {1, 2, 3} from different data-generating processes. ... Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016).
Researcher Affiliation	Academia	Dennis Frauen , Konstantin Hess & Stefan Feuerriegel LMU Munich Munich Center of Machine Learning (MCML) EMAIL
Pseudocode	No	The paper describes the methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/DennisFrauen/CATEMetaLearnersTime.
Open Datasets	Yes	Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016).
Dataset Splits	Yes	We sample a training dataset of size ntrain = 5000 and a test dataset of size ntest = 1000. ... We sample a training dataset of size ntrain = 10000 and a test dataset of size ntest = 1000.
Hardware Specification	Yes	For each transformer-based learner, training took approximately 90 seconds using n = 5000 samples and a standard computer with AMD Ryzen 7 Pro CPU and 32GB of RAM.
Software Dependencies	No	The paper mentions using a transformer-based architecture (Ashish Vaswani et al., 2017) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for any software libraries or programming languages.
Experiment Setup	Yes	Further details regarding architecture, training, hyperparameters, and runtime are in Appendix E. ... Each block consists of (i) a self-attention mechanism with three attention heads and hidden state dimension dmodel = 30, (ii) and a feed-forward network with hidden layer size dff = 20. Both the (i) self-attention mechanism and (ii) the feed-forward network employ residual connections, which are followed by dropout layers with dropout probabilities p = 0.1, respectively. ... We employ additional weight decay for the two-stage learners to avoid overfitting during the pseudo-outcome regression.