Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
BELLA: Black-box model Explanations by Local Linear Approximations
Authors: Nedeljko Radulovic, Albert Bifet, Fabian M. Suchanek
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We can show through extensive experiments (in Section 5) on a dozen datasets that BELLA beats all existing approaches across nearly all desiderata. ... We performed experiments on datasets from two standard repositories (Dua and Graff, 2017; Romano et al., 2021) (shown in Table 1). Among them is also a high-dimensional dataset, Superconductivity, with 81 features. All categorical features have been one-hot encoded and all numerical features have been standardized. We draw a random 10% of each dataset as testing data. To show that BELLA works with different families of models, we trained a random forest (with 100 trees), and a neural network (with one hidden layer with 500 nodes) as black-box models. Since the results do not differ much, we show only experiments with the neural network here, while the experiments with the random forest are in Appendix A. |
| Researcher Affiliation | Academia | Nedeljko Radulović EMAIL Telecom Paris, Institut Polytechnique de Paris, France Albert Bifet EMAIL Telecom Paris, Institut Polytechnique de Paris, France Fabian M. Suchanek EMAIL Telecom Paris, Institut Polytechnique de Paris, France |
| Pseudocode | Yes | Algorithm 1 BELLA Input: Dataset T with labels Y Labeled data point x T ... Algorithm 2 Neighborhood Search Input: Labeled data point x T Dataset T with labels Y Distances d:T R of the data points to x ... Algorithm 3 Train Local Surrogate Model Input: Neighborhood of data points N |
| Open Source Code | Yes | All code and the data for BELLA and the experiments is available on Github (URL masked for anonymity). |
| Open Datasets | Yes | Datasets. We performed experiments on datasets from two standard repositories (Dua and Graff, 2017; Romano et al., 2021) (shown in Table 1). Among them is also a high-dimensional dataset, Superconductivity, with 81 features. All categorical features have been one-hot encoded and all numerical features have been standardized. |
| Dataset Splits | Yes | We draw a random 10% of each dataset as testing data. |
| Hardware Specification | Yes | All experiments are run on a Fedora Linux (release 38) computer with an Intel(R) Xeon(R) v4 @ 2.20GHz CPU, a memory of 64 GB, and Python 3.9. |
| Software Dependencies | Yes | All experiments are run on a Fedora Linux (release 38) computer with an Intel(R) Xeon(R) v4 @ 2.20GHz CPU, a memory of 64 GB, and Python 3.9. We use the implementations of scikit-learn (Pedregosa et al., 2011). |
| Experiment Setup | Yes | To show that BELLA works with different families of models, we trained a random forest (with 100 trees), and a neural network (with one hidden layer with 500 nodes) as black-box models. ... Our method is implemented in Python. We set the step size to 10%. ... To determine the value of the shrinkage coefficient λ, we use 5-fold cross-validation (CV). To preserve the deterministic nature, we perform CV on adjacent slices of the dataset, without random shuffles. CV selects the best model in terms of the prediction error. Since the goal of this step is model selection, we want to avoid choosing λ too small, and hence we apply the common one-standard error rule. According to this rule, the most parsimonious model is the one whose error is no more than one standard error above the error of the best model (Hastie et al., 2009). |