CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks

Authors: Tuomas Oikarinen, Tsui-Wei Weng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide both qualitative and quantitative results of CLIP-Dissect in Sec 4.1 and 4.2 respectively. We also provide an ablation study on the choice of similarity function in Sec 4.3 and compare computation efficiency in Sec 4.4. Finally, we show that CLIP-Dissect can detect concepts that do not appear in the probing images in Sec 4.5. We evaluate our method through analyzing two pre-trained networks: Res Net-50 (He et al., 2016) trained on Image Net (Deng et al., 2009), and Res Net-18 trained on Places-365 (Zhou et al., 2017).
Researcher Affiliation Academia Tuomas Oikarinen UCSD CSE toikarinen@ucsd.edu Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu
Pseudocode Yes Algorithm. There are 3 key steps in CLIP-Dissect: 1. Compute the concept-activation matrix P. 2. Record activations of target neurons. 3. Determine the neuron labels.
Open Source Code Yes Our code is available at https://github.com/Trustworthy-ML-Lab/CLIPdissect.
Open Datasets Yes We evaluate our method through analyzing two pre-trained networks: Res Net-50 (He et al., 2016) trained on Image Net (Deng et al., 2009), and Res Net-18 trained on Places-365 (Zhou et al., 2017). Our method can also be applied to modern architectures such as Vision Transformers as discussed in Appendix A.5. Unless otherwise mentioned we use 20,000 most common English words2 as the concept set S.
Dataset Splits Yes Table 1: The cosine similarity of predicted labels compared to ground truth labels on final layer neurons of Res Net-50 trained on Image Net. ... Dprobe: Image Net val, CIFAR100 train
Hardware Specification Yes Table 4: The time it takes to describe the layers [ conv1 , layer1 , layer2 , layer3 , layer4 ] of Res Net-50 via different methods using our hardware(Tesla P100 GPU).
Software Dependencies No The paper mentions 'torchvision' and implicitly uses PyTorch, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For our experiments we use a = 2, λ = 0.6 and top 28 most highly activating images for neuron k as Bk which were found to give best quantitave results when describing final layer neurons of Res Net-50. For other hyperparameters we used a = 10 and λ = 1.