A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
Authors: DIPANJYOTI PAUL, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David Edward Carlyn, Samuel Stevens, Kaiya L Provost, Anuj Karpatne, Bryan Carstens, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate this on multiple datasets, including CUB-200-2011 (Wah et al., 2011), Birds-525 (Piosenka, 2023), Oxford Pet (Parkhi et al., 2012), Stanford Dogs (Khosla et al., 2011), Stanford Cars (Krause et al., 2013), FGVC-Aircraft (Maji et al., 2013), i Naturalist-2021 (Van Horn et al., 2021), and Cambridge butterfly (Montejo-Kovacevich et al., 2020). Table 1: Dataset statistics. Table 2: Accuracy(%) comparison. Evaluation. We reiterate that achieving a high classification accuracy is not the goal of this paper. The goal is to demonstrate the interpretability. We thus focus our evaluation on qualitative results. 4.1 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Collaboration | 1The Ohio State University 2Amazon Alexa 3Virginia Tech 4Princeton University 5Rensselaer Polytechnic Institute |
| Pseudocode | No | The paper describes the model architecture and training process in text and with diagrams (Figure 2), but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and pre-trained models are publicly accessible at the Imageomics Institute Git Hub site: https://github.com/Imageomics/INTR. |
| Open Datasets | Yes | We validate this on multiple datasets, including CUB-200-2011 (Wah et al., 2011), Birds-525 (Piosenka, 2023), Oxford Pet (Parkhi et al., 2012), Stanford Dogs (Khosla et al., 2011), Stanford Cars (Krause et al., 2013), FGVC-Aircraft (Maji et al., 2013), i Naturalist-2021 (Van Horn et al., 2021), and Cambridge butterfly (Montejo-Kovacevich et al., 2020). |
| Dataset Splits | No | While train and test splits are detailed, a clear validation split for reproducibility across all experiments is not provided. The mention of |
| Hardware Specification | No | The paper mentions |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | Training detail. The hyper-parameter details such as epochs, learning rate, and batch size for training INTR are reported in Appendix E. We use the Adam optimizer (Kingma & Ba, 2014) with its default hyper-parameters. We train INTR using the Step LR scheduler with a learning rate drop at 80 epochs. The rest of the hyper-parameters follow DETR. During our experiment, for all datasets, except for Bird, we set the learning rate to 1 e 4, while for Bird, we use a learning rate of 5 e 5. Additionally, we utilize a batch size of 16 for Bird, Dog, and Fish datasets, and a batch size of 12 for the other datasets. Furthermore, the number of epochs required for training is 100 for BF and Pet datasets, 170 for Dog, and 140 for the remaining datasets. |