TransZero: Attribute-Guided Transformer for Zero-Shot Learning

Authors: Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, Xinge You330-338

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Trans Zero achieves the new state of the art on three ZSL benchmarks. The qualitative results also demonstrate that Trans Zero refines visual features and provides attribute-level localization. Experiments Dataset Our extensive experiments are conducted on three popular ZSL benchmarks, including two fine-grained datasets (e.g., CUB (Welinder et al. 2010) and SUN (Patterson and Hays 2012)) and a coarse-grained dataset (e.g., AWA2 (Xian, Schiele, and Akata 2017)).
Researcher Affiliation Collaboration Shiming Chen1*, Ziming Hong1*, Yang Liu2, Guo-Sen Xie3, Baigui Sun2, Hao Li2, Qinmu Peng1, Ke Lu4,5, Xinge You1 1Huazhong University of Science and Technology 2Alibaba Group, Hangzhou, China 3Inception Institute of Artificial Intelligence 4 University of Chinese Academy of Sciences 5 Peng Cheng Laboratory, China
Pseudocode No The paper describes the proposed model and its components using text and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The codes are available at: https://github.com/shiming-chen/Trans Zero.
Open Datasets Yes Our extensive experiments are conducted on three popular ZSL benchmarks, including two fine-grained datasets (e.g., CUB (Welinder et al. 2010) and SUN (Patterson and Hays 2012)) and a coarse-grained dataset (e.g., AWA2 (Xian, Schiele, and Akata 2017)).
Dataset Splits Yes We use the training splits proposed in (Xian et al. 2018). CUB has 11,788 images of 200 bird classes (seen/unseen classes = 150/50) depicted with 312 attributes. SUN includes 14,340 images from 717 scene classes (seen/unseen classes = 645/72) depicted with 102 attributes. AWA2 consists of 37,322 images from 50 animal classes (seen/unseen classes = 40/10) depicted with 85 attributes.
Hardware Specification No The paper mentions using "a Res Net101 pre-trained on Image Net as the CNN backbone" but does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions using the SGD optimizer and ResNet101, but it does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use the SGD optimizer with hyperparameters (momentum = 0.9, weight decay = 0.0001) to optimize our model. The learning rate and batch size are set to 0.0001 and 50, respectively. We empirically set λSC to 0.3 and λAR to 0.005 for all datasets. The encoder and decoder layers are set to 1 with one attention head.