reproducibilityindex.ai

Learning Interpretable Concept Groups in CNNs

Authors: Saurabh Varshneya, Antoine Ledent, Robert A. Vandermeulen, Yunwen Lei, Matthias Enders, Damian Borth, Marius Kloft

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We quantitatively evaluate CGL s model interpretability using standard interpretability evaluation techniques and ﬁnd that our method increases interpretability scores in most cases. We also analyze the ﬁlters of our model qualitatively by visualizing the ﬁlters activations and ﬁnd that our training setup yields representations with noticeably more interpretable ﬁlters.
Researcher Affiliation	Collaboration	Saurabh Varshneya1 , Antoine Ledent1 , Robert A. Vandermeulen2 , Yunwen Lei3 , Matthias Enders4 , Damian Borth5 and Marius Kloft1 1Technical University of Kaiserslautern, Germany 2Technical University of Berlin, Germany 3University of Birmingham, United Kingdom 4NPZ Innovation Gmb H, Germany 5University of St.Gallen, Switzerland {varshneya, ledent, kloft}@cs.uni-kl.de, vandermeulen@tu-berlin.de, y.lei@bham.ac.uk, m.enders@npz-innovation.de, damian.borth@unisg.ch
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code will be available at: https://github.com/srb-cv/cgl
Open Datasets	Yes	We constructed a synthetic dataset with obvious human-interpretable visual concepts corresponding to shape and color... We tested our methods on three well-known convolutional architectures named Alexnet, Alexnet-B and VGG... All the networks are trained from scratch on the two well-known image classiﬁcation datasets Places365 [Zhou et al., 2017] and Ima-ge Net [Deng et al., 2009]. For evaluation, we exactly replicate the method from [Bau et al., 2017], as explained above to compute the interpretability of the trained models.
Dataset Splits	Yes	In order to quantify the interpretability of a trained CNN, we exactly replicate the evaluation on the Broden dataset as proposed by [Bau et al., 2017]. The Broden dataset contains pixel-wise annotations of a broad range of categories, which belong to one of six human-interpretable visual concepts. We compute the alignment of a ﬁlter with a category by comparing its activation maps for a set of images against the available ground truth using the same threshold and scoring function described by [Bau et al., 2017]. Table 3: A comparison of the Validation Accuracy and RUD Score on training with different regularizers.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU models, CPU types, memory).
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify its version number or any other software dependencies with version information.
Experiment Setup	Yes	For training, we use a simple CNN with 2 convolutional layers, having 128 and 256 ﬁlters in the ﬁrst and second layer respectively. We set the size of ﬁlters to 3 × 3 for both the layers. The value for r can be set as a hyperparameter. We ﬁnd that setting r to 3N l g works well in practice. λbn, λg and λs are the weightings of the deﬁned regularization and auxiliary losses with respect to the main loss.