Concept Activation Regions: A Generalized Framework For Concept-Based Explanations
Authors: Jonathan Crabbé, Mihaela van der Schaar
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments The code to reproduce all the experiments from this section is available at https://github.com/ Jonathan Crabbe/CARs and https://github.com/vanderschaarlab/CARs. 3.1 Empirical Evaluation Our purpose is to empirically validate the formalism described in the previous section. We have several independent components to evaluate: 1 the concept classifier used to detect the CARs Hc, 2 the global explanations induced by the TCAR values and 3 the feature importance scores induced by the concept densities c. Datasets. We perform our experiments on 3 datasets. |
| Researcher Affiliation | Academia | Jonathan Crabbé University of Cambridge jc2133@cam.ac.uk Mihaela van der Schaar University of Cambridge The Alan Turing Institute UCLA mv472@cam.ac.uk |
| Pseudocode | Yes | The implementation of our method closely follows Algorithms 1, 2 and 4 in the appendices. |
| Open Source Code | Yes | The code to reproduce all the experiments from this section is available at https://github.com/ Jonathan Crabbe/CARs and https://github.com/vanderschaarlab/CARs. |
| Open Datasets | Yes | Datasets. We perform our experiments on 3 datasets. 1 The MNIST dataset [56]... 2 The MIT-BIH Electrocardiogram (ECG) dataset [57, 58]... 3 The Caltech-UCSD Birds-200 (CUB) dataset [59]... We use the data collected with the Surveillance, Epidemiology, and End Results (SEER) Program. The dataset [69]... |
| Dataset Splits | Yes | We train a multilayer perceptron (MLP) to predict the patient s mortality on 90% of the data and test on the remaining 10%. The classifier is then evaluated by computing its accuracy on a holdout balanced concept set T c of size 100 sampled from the model s testing set. |
| Hardware Specification | No | Our computing resources are described in Appendices E and F. These appendices are not included in the provided text, so specific hardware details are not explicitly described in the main body. |
| Software Dependencies | No | The paper mentions using Python, Scikit-learn, PyTorch, and Captum, and refers to software dependencies being described in Appendices E and F. However, the provided text does not specify version numbers for these software components. For example, [71] cites 'Scikit-learn: Machine learning in Python' and [72] cites 'Captum: A unified and generic model interpretability library for Py Torch' without specific version numbers for Scikit-learn, PyTorch, or Captum. |
| Experiment Setup | Yes | For several of those latent spaces, we fit our CAR classifier (SVC with radial basis function kernel) to discriminate the concept sets Pc, N c for each concept c [C]. These two sets have a size N c = 200 and are sampled from the model s training set. We train a multilayer perceptron (MLP)... We fit a CAR classifier (SVC with linear kernel) to discriminate the concepts sets Pc, N c for each grade c [5]. Both of those sets contain N c = 250 patients sampled from the training set. |