reproducibilityindex.ai

ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder

Authors: Sangwon Kim, Jaeyeal Nam, Byoung Chul Ko

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compared the performance of Vi T-Ne T with other state-of-art methods using widely used fine-grained visual categorization benchmark datasets and experimentally proved that the proposed method is superior in terms of the classification performance and interpretability.
Researcher Affiliation	Academia	Sangwon Kim 1 Jaeyeal Nam 1 Byoung Chul Ko 1 1Department of Computer Engineering, Keimyung University, Daegu, South Korea.
Pseudocode	Yes	Algorithm 1 Training a Vi T-Ne T
Open Source Code	Yes	The code and models are publicly available at https://github.com/ jumpsnack/Vi T-Ne T.
Open Datasets	Yes	Datasets We evaluated our Vi T-Ne T on three FGVC datasets: CUB-200-2011 (Wah et al., 2011), Stanford Cars (Krause et al., 2013), and Stanford Dogs (Khosla et al., 2011), and compared our model with previous SOTA models in terms of accuracy and interpretability.
Dataset Splits	No	The paper provides details for training and testing splits for each dataset (e.g., 'CUB-200-2011... 5,994 training images and 5,794 testing images'), but does not explicitly mention a validation set split.
Hardware Specification	Yes	Training and testing were conducted using four NVIDIA Tesla V100 32GB GPUs with APEX.
Software Dependencies	No	The paper mentions software like 'Py Torch', 'Adam W optimizer', and 'APEX', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The learning rate was initialized as 2e-5 for CUB-200-2011, 2e-4 for Stanford Dogs, and 2e-3 for Stanford Cars. The batch size was set to 16.