reproducibilityindex.ai

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Authors: Xiaohan Yu, Jun Wang, Yongsheng Gao

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	CLE-Vi T demonstrates strong performance on 7 publicly available datasets, demonstrating its effectiveness in the ultra-FGVC task.
Researcher Affiliation	Academia	1School of Engineering and Built Environment, Griffith University, Australia 2Department of Computer Science, University of Warwick, UK {xiaohan.yu, yongsheng.gao}@griffith.edu.au, jun.wang.3@warwick.ac.uk
Pseudocode	No	The paper describes the methods textually and with diagrams (e.g., Figure 4), but it does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Markin-Wang/CLEVi T
Open Datasets	Yes	Following [Yu et al., 2023], five ultra-fine-grained image datasets are adopted for evaluation including Cotton80, Soy Local, Soy Gene, Soy Ageing and Soy Global. Moreover, two fine-grained datasets, Apple Foliar disease dataset [Thapa et al., 2020] and CUB-200-2011 (CUB) [Wah et al., 2011] are also used to further verify the effectiveness of the proposed method.
Dataset Splits	No	Table 1 provides the number of training and test images for each dataset, but does not explicitly mention a separate validation split or its size/proportion.
Hardware Specification	No	The paper mentions using a Swin Transformer Base as the backbone model but does not specify any hardware details like GPU models, CPU types, or memory used for experiments.
Software Dependencies	No	The paper mentions using the Adam W optimizer and adopting standard data augmentation techniques, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The proportion of the masked region and the number of parts n and are set to [0.15, 0.45] and 4 respectively. The margin β in Equation 7 is 1. λ and γ are both set to 1 for all datasets except 0.3 and 0.5 for CUB dataset. ...input images are first resized to 600 600 for all datasets. Random (Center) cropping is then applied to crop the images into 448 448 during the training (inference) phase. After that, we adopt random horizontal flipping, color jitter, and random rotation during the training. The whole architecture is optimized by Adam W optimizer. In our experiment settings, the batch size and the learning rate are set to 12 and 1e-3 for all the datasets.