Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Authors: Dennis Frauen, Konstantin Hess, Stefan Feuerriegel
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our IVW-DR-learner achieves superior performance in our experiments, particularly in regimes with low overlap and long time horizons. ... In this section, we compare our proposed meta-learners empirically. ... We simulate three datasets Dj with j {1, 2, 3} from different data-generating processes. ... Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016). |
| Researcher Affiliation | Academia | Dennis Frauen , Konstantin Hess & Stefan Feuerriegel LMU Munich Munich Center of Machine Learning (MCML) EMAIL |
| Pseudocode | No | The paper describes the methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/DennisFrauen/CATEMetaLearnersTime. |
| Open Datasets | Yes | Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016). |
| Dataset Splits | Yes | We sample a training dataset of size ntrain = 5000 and a test dataset of size ntest = 1000. ... We sample a training dataset of size ntrain = 10000 and a test dataset of size ntest = 1000. |
| Hardware Specification | Yes | For each transformer-based learner, training took approximately 90 seconds using n = 5000 samples and a standard computer with AMD Ryzen 7 Pro CPU and 32GB of RAM. |
| Software Dependencies | No | The paper mentions using a transformer-based architecture (Ashish Vaswani et al., 2017) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for any software libraries or programming languages. |
| Experiment Setup | Yes | Further details regarding architecture, training, hyperparameters, and runtime are in Appendix E. ... Each block consists of (i) a self-attention mechanism with three attention heads and hidden state dimension dmodel = 30, (ii) and a feed-forward network with hidden layer size dff = 20. Both the (i) self-attention mechanism and (ii) the feed-forward network employ residual connections, which are followed by dropout layers with dropout probabilities p = 0.1, respectively. ... We employ additional weight decay for the two-stage learners to avoid overfitting during the pseudo-outcome regression. |