Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Causal Explanation-Guided Learning for Organ Allocation

Authors: Alessandro Marchese, Jeroen Berrevoets, Sam Verboven

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive synthetic and semi-synthetic experiments, we demonstrate that CLEXNET outperforms existing acceptance models in generalization, calibration, and predictive accuracy offering a practical and robust improvement for policy simulators. More broadly, our approach opens a new direction in counterfactual machine learning by operationalizing contrastive human feedback in high-stakes, observational settings like organ transplantation.
Researcher Affiliation	Academia	Alessandro Marchese Vrije Universiteit Brussel Jeroen Berrevoets King s College London Sam Verboven Vrije Universiteit Brussel
Pseudocode	Yes	Algorithm 1: CLEXNET : single instance training step with explanation-guided augmented loss
Open Source Code	Yes	All code, synthetic generators and an implementation of CLEXNET are made public to facilitate independent assessment: https://github.com/Alessandro Marchese/Clex Net.
Open Datasets	Yes	For the semi-synthetic evaluation, we use UNOS-PTR [15] liver offers recorded between 2021 and 2024. This data consists of approximately 1.1M offers made between 24k unique organs and 46k unique patients.
Dataset Splits	Yes	Dobs is then split further into Dtrain, which is used to train the models, and Dtest, which is used to test the models on observational data. We allocate 70% of Dobs to Dtrain, 15% for a validation set and 15% to Dtest (stratified by Y).
Hardware Specification	Yes	Experiments ran on a 13th Gen Intel(R) Core(TM) i9-13900HX processor with 32GB RAM.
Software Dependencies	No	The paper mentions software components like PyTorch implicitly through the neural network architecture, but it does not specify any software dependencies with their corresponding version numbers.
Experiment Setup	Yes	Table 7: CLEXNET s Hyperparameters Component Hyperparameters Shared Encoder Φθϕ(X, O) Dense(32, L2), ReLU Activation Dense(32, L2), ReLU Activation Acceptance Head YθY (ϕ) Dense(32, L2), Sigmoid Activation Organ Cluster Head cθp(ϕ) Dense(32, L2), ReLU Activation Organ Cluster Amount k 3 Organ Cluster Loss Weight λ 0.15 Explanation Loss Weight ρ 0.15 Augmentation Batch M 100 Training Parameters Maximum Epochs: 1000 Patience: 30 Learning Rate: 1e-3