Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ODE Discovery for Longitudinal Heterogeneous Treatment Effects Inference
Authors: Krzysztof Kacprzyk, Samuel Holt, Jeroen Berrevoets, Zhaozhi Qian, Mihaela van der Schaar
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using our framework, we build an example method (called INSITE), tested in accepted benchmark settings used throughout the literature. |
| Researcher Affiliation | Academia | Krzysztof Kacprzyk University of Cambridge Samuel Holt University of Cambridge Jeroen Berrevoets University of Cambridge Zhaozhi Qian University of Cambridge Mihaela van der Schaar University of Cambridge |
| Pseudocode | Yes | Algorithm 1 Individualized Nonlinear Sparse Identification Treatment Effect (INSITE) |
| Open Source Code | Yes | We provide all code at https://github.com/samholt/ODE-Discovery-f or-Longitudinal-Heterogeneous-Treatment-Effects-Inference. |
| Open Datasets | Yes | We generate a dataset for each underlying pharmacological model F and a given action policy... This forms a dataset as described in Section 3... Here, to explore continuous types of treatments, we use a continuous chemotherapy treatment c(t) and a binary radiotherapy treatment d(t), both changing over time. For both models y = x, is the volume of the tumor t days after diagnosis, modeled separately as: V x(t), if a = 0 C1 V x(t), if a = 1 (5) dt = ρ log K | {z } Tumorgrowth βc C(t) | {z } Chemotherapy (αrd(t) + βrd(t)2) | {z } Radiotherapy + et |{z} Noise where the parameters C0, C1, V, ρ, K, γ, α, β, et are sampled according to the different layers of between-subject variability (table 2) forming variations of A-D, with parameter distributions following that as described in Geng et al. (2017) or otherwise detailed in appendix F. |
| Dataset Splits | Yes | With 1000 training trajectories, 100 validation trajectories and 100 test trajectories, unless otherwise noted. |
| Hardware Specification | Yes | We perform all experiments and training using a single Intel Core i9-12900K CPU @ 3.20GHz, 64GB RAM with an Nvidia RTX3090 GPU 24GB. |
| Software Dependencies | No | The paper states, 'all the baselines are implemented in Py Torch lightning (Falcon, 2019) and trained with the Adam optimizer (Kingma & Ba, 2014)'. While it names software components and cites papers, it does not provide specific version numbers for PyTorch, Python, or other libraries, which is required for reproducibility. |
| Experiment Setup | Yes | The hyperparameters are: the propensity treatment model has 8 sequential hidden units, a dropout rate of 0.1, one layer, uses a batch size of 64, with a max grad norm of 2.0, and is optimized with the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001... Here the regularization parameter λ is set to λ = 10.0 across all experiments. |