A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Authors: DIPANJYOTI PAUL, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David Edward Carlyn, Samuel Stevens, Kaiya L Provost, Anuj Karpatne, Bryan Carstens, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate this on multiple datasets, including CUB-200-2011 (Wah et al., 2011), Birds-525 (Piosenka, 2023), Oxford Pet (Parkhi et al., 2012), Stanford Dogs (Khosla et al., 2011), Stanford Cars (Krause et al., 2013), FGVC-Aircraft (Maji et al., 2013), i Naturalist-2021 (Van Horn et al., 2021), and Cambridge butterfly (Montejo-Kovacevich et al., 2020). Table 1: Dataset statistics. Table 2: Accuracy(%) comparison. Evaluation. We reiterate that achieving a high classification accuracy is not the goal of this paper. The goal is to demonstrate the interpretability. We thus focus our evaluation on qualitative results. 4.1 EXPERIMENTAL RESULTS
Researcher Affiliation Collaboration 1The Ohio State University 2Amazon Alexa 3Virginia Tech 4Princeton University 5Rensselaer Polytechnic Institute
Pseudocode No The paper describes the model architecture and training process in text and with diagrams (Figure 2), but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our code and pre-trained models are publicly accessible at the Imageomics Institute Git Hub site: https://github.com/Imageomics/INTR.
Open Datasets Yes We validate this on multiple datasets, including CUB-200-2011 (Wah et al., 2011), Birds-525 (Piosenka, 2023), Oxford Pet (Parkhi et al., 2012), Stanford Dogs (Khosla et al., 2011), Stanford Cars (Krause et al., 2013), FGVC-Aircraft (Maji et al., 2013), i Naturalist-2021 (Van Horn et al., 2021), and Cambridge butterfly (Montejo-Kovacevich et al., 2020).
Dataset Splits No While train and test splits are detailed, a clear validation split for reproducibility across all experiments is not provided. The mention of
Hardware Specification No The paper mentions
Software Dependencies No The paper mentions
Experiment Setup Yes Training detail. The hyper-parameter details such as epochs, learning rate, and batch size for training INTR are reported in Appendix E. We use the Adam optimizer (Kingma & Ba, 2014) with its default hyper-parameters. We train INTR using the Step LR scheduler with a learning rate drop at 80 epochs. The rest of the hyper-parameters follow DETR. During our experiment, for all datasets, except for Bird, we set the learning rate to 1 e 4, while for Bird, we use a learning rate of 5 e 5. Additionally, we utilize a batch size of 16 for Bird, Dog, and Fish datasets, and a batch size of 12 for the other datasets. Furthermore, the number of epochs required for training is 100 for BF and Pet datasets, 170 for Dog, and 140 for the remaining datasets.