CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
Authors: Tuomas Oikarinen, Tsui-Wei Weng
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide both qualitative and quantitative results of CLIP-Dissect in Sec 4.1 and 4.2 respectively. We also provide an ablation study on the choice of similarity function in Sec 4.3 and compare computation efficiency in Sec 4.4. Finally, we show that CLIP-Dissect can detect concepts that do not appear in the probing images in Sec 4.5. We evaluate our method through analyzing two pre-trained networks: Res Net-50 (He et al., 2016) trained on Image Net (Deng et al., 2009), and Res Net-18 trained on Places-365 (Zhou et al., 2017). |
| Researcher Affiliation | Academia | Tuomas Oikarinen UCSD CSE toikarinen@ucsd.edu Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu |
| Pseudocode | Yes | Algorithm. There are 3 key steps in CLIP-Dissect: 1. Compute the concept-activation matrix P. 2. Record activations of target neurons. 3. Determine the neuron labels. |
| Open Source Code | Yes | Our code is available at https://github.com/Trustworthy-ML-Lab/CLIPdissect. |
| Open Datasets | Yes | We evaluate our method through analyzing two pre-trained networks: Res Net-50 (He et al., 2016) trained on Image Net (Deng et al., 2009), and Res Net-18 trained on Places-365 (Zhou et al., 2017). Our method can also be applied to modern architectures such as Vision Transformers as discussed in Appendix A.5. Unless otherwise mentioned we use 20,000 most common English words2 as the concept set S. |
| Dataset Splits | Yes | Table 1: The cosine similarity of predicted labels compared to ground truth labels on final layer neurons of Res Net-50 trained on Image Net. ... Dprobe: Image Net val, CIFAR100 train |
| Hardware Specification | Yes | Table 4: The time it takes to describe the layers [ conv1 , layer1 , layer2 , layer3 , layer4 ] of Res Net-50 via different methods using our hardware(Tesla P100 GPU). |
| Software Dependencies | No | The paper mentions 'torchvision' and implicitly uses PyTorch, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For our experiments we use a = 2, λ = 0.6 and top 28 most highly activating images for neuron k as Bk which were found to give best quantitave results when describing final layer neurons of Res Net-50. For other hyperparameters we used a = 10 and λ = 1. |