On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Authors: Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate our method both on a synthetic dataset, where we have ground truth concept importance, as well as on real-world image and language datasets. |
| Researcher Affiliation | Collaboration | Chih-Kuan Yeh1, Been Kim2, Sercan Ö. Arık3, Chun-Liang Li3, Tomas Pfister3, and Pradeep Ravikumar1 1Machine Learning Department, Carnegie Mellon University 2Google Brain 3Google Cloud AI |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is released at https://github.com/chihkuanyeh/concept_exp. |
| Open Datasets | Yes | We perform experiments on Animals with Attribute (Aw A) [Lampert et al., 2009] that contains 50 animal classes. ... We apply our method on IMDB, a text dataset with movie reviews classified as either positive or negative. |
| Dataset Splits | Yes | We construct 48k training samples and 12k evaluation samples and use a convolutional neural network with 5 layers, obtaining 0.999 accuracy. ... We use 26905 images for training and 2965 images for evaluation. ... We use 37500 reviews for training and 12500 for testing. |
| Hardware Specification | Yes | The computational cost for discovering concepts and calculating concept SHAP is about 3 hours for Aw A dataset and less than 20 minutes for the toy dataset and IMDB, using a single 1080 Ti GPU, which can be further accelerated with parallelism. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | To calculate the completeness score, we can set g to be a DNN or a simple linear projection, and optimize using stochastic gradient descent. In our experiments, we simply set g to be a two-layer perceptron with 500 hidden units. ... For k-means and PCA, we take the embedding of the patch as input to be consistent to our method. ... K is a hyperparameter that is usually chosen based on domain knowledge of the desired frequency of concepts. In our results, we fix K to be half of the average class size in our experiments. When using batch update, we find that picking K pbatch size average class ratioq{2 works well in our experiments... |