Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Detecting Data Deviations in Electronic Health Records
Authors: Kaiping Zheng, Horng-Ruey Chua, Beng Chin Ooi
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both real-world EHR data from National University Hospital in Singapore and the public MIMIC-III dataset consistently validate the effectiveness of our approach in detecting data deviations in EHR data. Case studies further demonstrate its practical value in identifying clinically meaningful data deviations. |
| Researcher Affiliation | Academia | 1School of Computing, National University of Singapore 2Division of Nephrology, Department of Medicine, National University Hospital 3Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore 4School of Software Technology, Zhejiang University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Appendix C. Pseudocode for Core Stages of the Methodology. Appendix C.1. Algorithm 1: Data Shapley Value Computation Per Task in Ods. Appendix C.2. Algorithm 2: Knowledge Distillation from Ods to Onn. Appendix C.3. Algorithm 3: Knowledge Distillation from Onn to Ψ. |
| Open Source Code | Yes | Answer: [Yes] Justification: We provide the code in the supplementary materials. |
| Open Datasets | Yes | Experiments on both real-world EHR data from National University Hospital in Singapore and the public MIMIC-III dataset consistently validate the effectiveness of our approach in detecting data deviations in EHR data. |
| Dataset Splits | Yes | We partition the cohort into 85% for model development and 15% as a held-out set for computing data Shapley values. Within the 85%, we further split the data into 80% for training, 10% for validation, and 10% for testing. |
| Hardware Specification | Yes | The experiments are conducted on a server equipped with two Intel Xeon Gold 6248R CPUs, 768 GB of memory, and eight NVIDIA V100 GPUs connected via NVLINK. |
| Software Dependencies | Yes | All models are implemented using PyTorch version 1.12.1. |
| Experiment Setup | Yes | Regarding the hyperparameter settings, the task-specific neural oracle Onn is implemented as a multilayer perceptron (MLP) with three hidden layers of sizes 32, 16, and 8, respectively. The unified EHR data fidelity predictor Ψ is also an MLP, with hidden layers of sizes 64, 32, and 16. The representation dimension of r(t)(x) in the attention subnetwork (Equation 7) is set to 32. We use a learning rate of 0.01 for training Onn and 0.0001 for Ψ. The temperature parameter τ in Equation 11 is set to 0.5. Training is conducted for a maximum of 1000 epochs with a batch size of 128. Early stopping is employed if the validation performance does not improve for 50 consecutive epochs. |