reproducibilityindex.ai

TransFG: A Transformer Architecture for Fine-Grained Recognition

Authors: Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang852-860

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Trans FG and demonstrate the value of it by conducting experiments on ﬁve popular ﬁne-grained benchmarks where we achieve state-of-the-art performance.
Researcher Affiliation	Collaboration	Ju He1, Jie-Neng Chen1, Shuai Liu3, Adam Kortylewski2, Cheng Yang3, Yutong Bai1, Changhu Wang3 1 Johns Hopkins University 2 Max Planck Institute for Informatics 3 Byte Dance Inc.
Pseudocode	No	The paper describes its method in detail using equations and textual descriptions, but it does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link to open-source code or explicitly state that the code for their method is released.
Open Datasets	Yes	We evaluate our proposed Trans FG on ﬁve widely used ﬁne-grained benchmarks, i.e., CUB-200-2011 (Wah et al. 2011), Stanford Cars (Krause et al. 2013), Stanford Dogs (Khosla et al. 2011), NABirds (Van Horn et al. 2015) and i Nat2017 (Van Horn et al. 2018).
Dataset Splits	Yes	First, we resize input images to 448 448 except 304 304 on i Nat2017 for fair comparison (random cropping for training and center cropping for testing).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions loading weights from 'ofﬁcial Vi T-B 16 model pretrained on Image Net21k' and using 'SGD optimizer'. However, it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Unless stated otherwise, we implement Trans FG as follows. First, we resize input images to 448 448 except 304 304 on i Nat2017 for fair comparison (random cropping for training and center cropping for testing). We split image to patches of size 16 and the step size of sliding window is set to be 12. Thus the H, W, P, S in Eq 1 are 448, 448, 16, 12 respectively. The margin α in Eq 9 is set to be 0.4. We load intermediate weights from ofﬁcial Vi T-B 16 model pretrained on Image Net21k. The batch size is set to 16. SGD optimizer is employed with a momentum of 0.9. The learning rate is initialized as 0.03 except 0.003 for Stanford Dogs dataset and 0.01 for i Nat2017 dataset. We adopt cosine annealing as the scheduler of optimizer.