Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder
Authors: Sangwon Kim, Jaeyeal Nam, Byoung Chul Ko
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compared the performance of Vi T-Ne T with other state-of-art methods using widely used fine-grained visual categorization benchmark datasets and experimentally proved that the proposed method is superior in terms of the classification performance and interpretability. |
| Researcher Affiliation | Academia | Sangwon Kim 1 Jaeyeal Nam 1 Byoung Chul Ko 1 1Department of Computer Engineering, Keimyung University, Daegu, South Korea. |
| Pseudocode | Yes | Algorithm 1 Training a Vi T-Ne T |
| Open Source Code | Yes | The code and models are publicly available at https://github.com/ jumpsnack/Vi T-Ne T. |
| Open Datasets | Yes | Datasets We evaluated our Vi T-Ne T on three FGVC datasets: CUB-200-2011 (Wah et al., 2011), Stanford Cars (Krause et al., 2013), and Stanford Dogs (Khosla et al., 2011), and compared our model with previous SOTA models in terms of accuracy and interpretability. |
| Dataset Splits | No | The paper provides details for training and testing splits for each dataset (e.g., 'CUB-200-2011... 5,994 training images and 5,794 testing images'), but does not explicitly mention a validation set split. |
| Hardware Specification | Yes | Training and testing were conducted using four NVIDIA Tesla V100 32GB GPUs with APEX. |
| Software Dependencies | No | The paper mentions software like 'Py Torch', 'Adam W optimizer', and 'APEX', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The learning rate was initialized as 2e-5 for CUB-200-2011, 2e-4 for Stanford Dogs, and 2e-3 for Stanford Cars. The batch size was set to 16. |