CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization
Authors: Xiaohan Yu, Jun Wang, Yongsheng Gao
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | CLE-Vi T demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task. |
| Researcher Affiliation | Academia | 1School of Engineering and Built Environment, Griffith University, Australia 2Department of Computer Science, University of Warwick, UK {xiaohan.yu, yongsheng.gao}@griffith.edu.au, jun.wang.3@warwick.ac.uk |
| Pseudocode | No | The paper describes the methods textually and with diagrams (e.g., Figure 4), but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Markin-Wang/CLEVi T |
| Open Datasets | Yes | Following [Yu et al., 2023], five ultra-fine-grained image datasets are adopted for evaluation including Cotton80, Soy Local, Soy Gene, Soy Ageing and Soy Global. Moreover, two fine-grained datasets, Apple Foliar disease dataset [Thapa et al., 2020] and CUB-200-2011 (CUB) [Wah et al., 2011] are also used to further verify the effectiveness of the proposed method. |
| Dataset Splits | No | Table 1 provides the number of training and test images for each dataset, but does not explicitly mention a separate validation split or its size/proportion. |
| Hardware Specification | No | The paper mentions using a Swin Transformer Base as the backbone model but does not specify any hardware details like GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and adopting standard data augmentation techniques, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The proportion of the masked region and the number of parts n and are set to [0.15, 0.45] and 4 respectively. The margin β in Equation 7 is 1. λ and γ are both set to 1 for all datasets except 0.3 and 0.5 for CUB dataset. ...input images are first resized to 600 600 for all datasets. Random (Center) cropping is then applied to crop the images into 448 448 during the training (inference) phase. After that, we adopt random horizontal flipping, color jitter, and random rotation during the training. The whole architecture is optimized by Adam W optimizer. In our experiment settings, the batch size and the learning rate are set to 12 and 1e-3 for all the datasets. |