Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BELLA: Black-box model Explanations by Local Linear Approximations

Authors: Nedeljko Radulovic, Albert Bifet, Fabian M. Suchanek

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We can show through extensive experiments (in Section 5) on a dozen datasets that BELLA beats all existing approaches across nearly all desiderata. ... We performed experiments on datasets from two standard repositories (Dua and Graff, 2017; Romano et al., 2021) (shown in Table 1). Among them is also a high-dimensional dataset, Superconductivity, with 81 features. All categorical features have been one-hot encoded and all numerical features have been standardized. We draw a random 10% of each dataset as testing data. To show that BELLA works with different families of models, we trained a random forest (with 100 trees), and a neural network (with one hidden layer with 500 nodes) as black-box models. Since the results do not differ much, we show only experiments with the neural network here, while the experiments with the random forest are in Appendix A.
Researcher Affiliation	Academia	Nedeljko Radulović EMAIL Telecom Paris, Institut Polytechnique de Paris, France Albert Bifet EMAIL Telecom Paris, Institut Polytechnique de Paris, France Fabian M. Suchanek EMAIL Telecom Paris, Institut Polytechnique de Paris, France
Pseudocode	Yes	Algorithm 1 BELLA Input: Dataset T with labels Y Labeled data point x T ... Algorithm 2 Neighborhood Search Input: Labeled data point x T Dataset T with labels Y Distances d:T R of the data points to x ... Algorithm 3 Train Local Surrogate Model Input: Neighborhood of data points N
Open Source Code	Yes	All code and the data for BELLA and the experiments is available on Github (URL masked for anonymity).
Open Datasets	Yes	Datasets. We performed experiments on datasets from two standard repositories (Dua and Graff, 2017; Romano et al., 2021) (shown in Table 1). Among them is also a high-dimensional dataset, Superconductivity, with 81 features. All categorical features have been one-hot encoded and all numerical features have been standardized.
Dataset Splits	Yes	We draw a random 10% of each dataset as testing data.
Hardware Specification	Yes	All experiments are run on a Fedora Linux (release 38) computer with an Intel(R) Xeon(R) v4 @ 2.20GHz CPU, a memory of 64 GB, and Python 3.9.
Software Dependencies	Yes	All experiments are run on a Fedora Linux (release 38) computer with an Intel(R) Xeon(R) v4 @ 2.20GHz CPU, a memory of 64 GB, and Python 3.9. We use the implementations of scikit-learn (Pedregosa et al., 2011).
Experiment Setup	Yes	To show that BELLA works with different families of models, we trained a random forest (with 100 trees), and a neural network (with one hidden layer with 500 nodes) as black-box models. ... Our method is implemented in Python. We set the step size to 10%. ... To determine the value of the shrinkage coefficient λ, we use 5-fold cross-validation (CV). To preserve the deterministic nature, we perform CV on adjacent slices of the dataset, without random shuffles. CV selects the best model in terms of the prediction error. Since the goal of this step is model selection, we want to avoid choosing λ too small, and hence we apply the common one-standard error rule. According to this rule, the most parsimonious model is the one whose error is no more than one standard error above the error of the best model (Hastie et al., 2009).