Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Detecting Data Deviations in Electronic Health Records

Authors: Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both real-world EHR data from National University Hospital in Singapore and the public MIMIC-III dataset consistently validate the effectiveness of our approach in detecting data deviations in EHR data. Case studies further demonstrate its practical value in identifying clinically meaningful data deviations.
Researcher Affiliation	Academia	1School of Computing, National University of Singapore 2Division of Nephrology, Department of Medicine, National University Hospital 3Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore 4School of Software Technology, Zhejiang University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Appendix C. Pseudocode for Core Stages of the Methodology. Appendix C.1. Algorithm 1: Data Shapley Value Computation Per Task in Ods. Appendix C.2. Algorithm 2: Knowledge Distillation from Ods to Onn. Appendix C.3. Algorithm 3: Knowledge Distillation from Onn to Ψ.
Open Source Code	Yes	Answer: [Yes] Justification: We provide the code in the supplementary materials.
Open Datasets	Yes	Experiments on both real-world EHR data from National University Hospital in Singapore and the public MIMIC-III dataset consistently validate the effectiveness of our approach in detecting data deviations in EHR data.
Dataset Splits	Yes	We partition the cohort into 85% for model development and 15% as a held-out set for computing data Shapley values. Within the 85%, we further split the data into 80% for training, 10% for validation, and 10% for testing.
Hardware Specification	Yes	The experiments are conducted on a server equipped with two Intel Xeon Gold 6248R CPUs, 768 GB of memory, and eight NVIDIA V100 GPUs connected via NVLINK.
Software Dependencies	Yes	All models are implemented using PyTorch version 1.12.1.
Experiment Setup	Yes	Regarding the hyperparameter settings, the task-specific neural oracle Onn is implemented as a multilayer perceptron (MLP) with three hidden layers of sizes 32, 16, and 8, respectively. The unified EHR data fidelity predictor Ψ is also an MLP, with hidden layers of sizes 64, 32, and 16. The representation dimension of r(t)(x) in the attention subnetwork (Equation 7) is set to 32. We use a learning rate of 0.01 for training Onn and 0.0001 for Ψ. The temperature parameter τ in Equation 11 is set to 0.5. Training is conducted for a maximum of 1000 epochs with a batch size of 128. Early stopping is employed if the validation performance does not improve for 50 consecutive epochs.