reproducibilityindex.ai

On Completeness-aware Concept-Based Explanations in Deep Neural Networks

Authors: Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we demonstrate our method both on a synthetic dataset, where we have ground truth concept importance, as well as on real-world image and language datasets.
Researcher Affiliation	Collaboration	Chih-Kuan Yeh1, Been Kim2, Sercan Ö. Arık3, Chun-Liang Li3, Tomas Pﬁster3, and Pradeep Ravikumar1 1Machine Learning Department, Carnegie Mellon University 2Google Brain 3Google Cloud AI
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The code is released at https://github.com/chihkuanyeh/concept_exp.
Open Datasets	Yes	We perform experiments on Animals with Attribute (Aw A) [Lampert et al., 2009] that contains 50 animal classes. ... We apply our method on IMDB, a text dataset with movie reviews classiﬁed as either positive or negative.
Dataset Splits	Yes	We construct 48k training samples and 12k evaluation samples and use a convolutional neural network with 5 layers, obtaining 0.999 accuracy. ... We use 26905 images for training and 2965 images for evaluation. ... We use 37500 reviews for training and 12500 for testing.
Hardware Specification	Yes	The computational cost for discovering concepts and calculating concept SHAP is about 3 hours for Aw A dataset and less than 20 minutes for the toy dataset and IMDB, using a single 1080 Ti GPU, which can be further accelerated with parallelism.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	To calculate the completeness score, we can set g to be a DNN or a simple linear projection, and optimize using stochastic gradient descent. In our experiments, we simply set g to be a two-layer perceptron with 500 hidden units. ... For k-means and PCA, we take the embedding of the patch as input to be consistent to our method. ... K is a hyperparameter that is usually chosen based on domain knowledge of the desired frequency of concepts. In our results, we ﬁx K to be half of the average class size in our experiments. When using batch update, we ﬁnd that picking K pbatch size average class ratioq{2 works well in our experiments...