Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Counterfactual Identifiability via Dynamic Optimal Transport
Authors: Fabio De Sousa Ribeiro, Ainkaran Santhirasekaram, Ben Glocker
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct two sets of experiments: (i) a constructed scenario where the ground-truth counterfactuals are known and we can, in principle, verify our identifiability claims; (ii) a real-world medical imaging dataset widely used for counterfactual inference. |
| Researcher Affiliation | Academia | Fabio De Sousa Ribeiro Ainkaran Santhirasekaram Ben Glocker Imperial College London, UK |
| Pseudocode | No | The paper describes methods and formulations in natural language and mathematical equations but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We are working on an open-source version of the code and guarantee that it will be made publicly available. |
| Open Datasets | Yes | For this study, we adapt the counterfactual ellipse generation setup by Nasr-Esfahany et al. (2023). |
| Dataset Splits | Yes | In all cases, we split our datasets into 70/10/20% for training, validation and testing, respectively. ... We then divided the dataset into 62,336 subjects for training, 9,968 for validation and 30,535 for testing, again following the exact same protocol as both Ribeiro et al. (2023) and Xia et al. (2024). |
| Hardware Specification | Yes | All our models were trained on L40 GPUs, with the full model fitting entirely on a single GPU. |
| Software Dependencies | No | We used PyTorch (Paszke et al., 2019) to train all our modes for 500k steps, under identical hyperparameter setups. |
| Experiment Setup | Yes | We used the AdamW optimizer with a learning rate of 10^-4, weight decay of 10^-4, β1 = 0.9, β1 = 0.999, ϵ = 10^-8, and batch size 256. ... For training, we used the AdamW optimiser with a learning rate of 10^-4, weight decay of 10^-4, β1 = 0.9, β1 = 0.999, ϵ = 10^-8, and a batch size of 64. |