ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder
Authors: Sangwon Kim, Jaeyeal Nam, Byoung Chul Ko
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compared the performance of Vi T-Ne T with other state-of-art methods using widely used fine-grained visual categorization benchmark datasets and experimentally proved that the proposed method is superior in terms of the classification performance and interpretability. |
| Researcher Affiliation | Academia | Sangwon Kim 1 Jaeyeal Nam 1 Byoung Chul Ko 1 1Department of Computer Engineering, Keimyung University, Daegu, South Korea. |
| Pseudocode | Yes | Algorithm 1 Training a Vi T-Ne T |
| Open Source Code | Yes | The code and models are publicly available at https://github.com/ jumpsnack/Vi T-Ne T. |
| Open Datasets | Yes | Datasets We evaluated our Vi T-Ne T on three FGVC datasets: CUB-200-2011 (Wah et al., 2011), Stanford Cars (Krause et al., 2013), and Stanford Dogs (Khosla et al., 2011), and compared our model with previous SOTA models in terms of accuracy and interpretability. |
| Dataset Splits | No | The paper provides details for training and testing splits for each dataset (e.g., 'CUB-200-2011... 5,994 training images and 5,794 testing images'), but does not explicitly mention a validation set split. |
| Hardware Specification | Yes | Training and testing were conducted using four NVIDIA Tesla V100 32GB GPUs with APEX. |
| Software Dependencies | No | The paper mentions software like 'Py Torch', 'Adam W optimizer', and 'APEX', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The learning rate was initialized as 2e-5 for CUB-200-2011, 2e-4 for Stanford Dogs, and 2e-3 for Stanford Cars. The batch size was set to 16. |