Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Evaluating and Aggregating Feature-based Model Explanations
Authors: Umang Bhatt, Adrian Weller, José M. F. Moura
IJCAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate models trained on the following datasets: Adult, Iris [Dua and Graff, 2017], MIMIC [Johnson et al., 2016], and MNIST [Le Cun et al., 1998]. ... In Table 2, we report results for faithfulness for various explanation functions. ... In Table 3, we report the max and average sensitivities for various explanation functions. |
| Researcher Affiliation | Academia | Umang Bhatt1,2 , Adrian Weller1,3 and Jos e M. F. Moura2 1University of Cambridge 2Carnegie Mellon University 3The Alan Turing Institute |
| Pseudocode | No | The paper describes algorithms but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific repository link, explicit code release statement, or mention code in supplementary materials for the described methodology. |
| Open Datasets | Yes | We evaluate models trained on the following datasets: Adult, Iris [Dua and Graff, 2017], MIMIC [Johnson et al., 2016], and MNIST [Le Cun et al., 1998]. |
| Dataset Splits | No | The paper mentions 'test accuracy' but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'multilayer perceptron (MLP)', 'leaky-Re LU activation', and 'ADAM optimizer' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper mentions using a 'multilayer perceptron (MLP)' with 'leaky-Re LU activation' and the 'ADAM optimizer' but does not provide specific hyperparameter values, training configurations, or detailed system-level settings for the experimental setup. |