On Completeness-aware Concept-Based Explanations in Deep Neural Networks

Authors: Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate our method both on a synthetic dataset, where we have ground truth concept importance, as well as on real-world image and language datasets.
Researcher Affiliation Collaboration Chih-Kuan Yeh1, Been Kim2, Sercan Ö. Arık3, Chun-Liang Li3, Tomas Pfister3, and Pradeep Ravikumar1 1Machine Learning Department, Carnegie Mellon University 2Google Brain 3Google Cloud AI
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 1The code is released at https://github.com/chihkuanyeh/concept_exp.
Open Datasets Yes We perform experiments on Animals with Attribute (Aw A) [Lampert et al., 2009] that contains 50 animal classes. ... We apply our method on IMDB, a text dataset with movie reviews classified as either positive or negative.
Dataset Splits Yes We construct 48k training samples and 12k evaluation samples and use a convolutional neural network with 5 layers, obtaining 0.999 accuracy. ... We use 26905 images for training and 2965 images for evaluation. ... We use 37500 reviews for training and 12500 for testing.
Hardware Specification Yes The computational cost for discovering concepts and calculating concept SHAP is about 3 hours for Aw A dataset and less than 20 minutes for the toy dataset and IMDB, using a single 1080 Ti GPU, which can be further accelerated with parallelism.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes To calculate the completeness score, we can set g to be a DNN or a simple linear projection, and optimize using stochastic gradient descent. In our experiments, we simply set g to be a two-layer perceptron with 500 hidden units. ... For k-means and PCA, we take the embedding of the patch as input to be consistent to our method. ... K is a hyperparameter that is usually chosen based on domain knowledge of the desired frequency of concepts. In our results, we fix K to be half of the average class size in our experiments. When using batch update, we find that picking K pbatch size average class ratioq{2 works well in our experiments...