Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OrdShap: Feature Position Importance for Sequential Black-Box Models

Authors: Davin Hill, Brian Hill, Aria Masoomi, Vijay Nori, Robert E. Tillman, Jennifer Dy

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate Ord Shap on its ability to identify the importance of a given sample s feature ordering. In 6.1 we quantitatively evaluate Ord Shap against existing attribution methods. In 6.2 we investigate Ord Shap on a synthetic dataset. In 6.3 we qualitatively compare Ord Shap with Kernel SHAP. In 6.4 we present execution time results.
Researcher Affiliation	Collaboration	Davin Hill Northeastern University Brian L. Hill Age Bold Aria Masoomi Northeastern University Vijay S. Nori Optum AI Robert E. Tillman Optum AI Jennifer Dy Northeastern University
Pseudocode	Yes	Algorithm 1 Ord Shap Sampling Algorithm
Open Source Code	No	All source code will be provided for the review process. Public release of the source code is contingent on an internal review process and will be released if/when possible after acceptance.
Open Datasets	Yes	We evaluated Ord Shap on two EHR datasets (MIMICIII [34] and EICU [57]) and a natural language dataset (IMDB [45]).
Dataset Splits	Yes	We split the patients into a training set (80%) and test set (20%).
Hardware Specification	Yes	All experiments were performed on an internal cluster using AMD 7302 16-Core processors and NVIDIA A100 GPUs.
Software Dependencies	No	We train a modified BERT model [19] using the Huggingface library [85] with 6 attention heads, 3 hidden layers of width 384, and a dropout rate of 0.5. We use the implementation from the Captum library [37] in our experiments. We use the official implementation of Kernel SHAP from the SHAP library.
Experiment Setup	Yes	After preprocessing, we train a modified BERT model [19] using the Huggingface library [85] with 6 attention heads, 3 hidden layers of width 384, and a dropout rate of 0.5.