reproducibilityindex.ai

Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Authors: Yizhe Zhu, Jianwen Xie, Zhiqiang Tang, Xi Peng, Ahmed Elgammal

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experiments on three widely used zero-shot learning benchmarks, we show the efﬁcacy of the multi-attention localization and our proposed approach improves the state-of-the-art results by a considerable margin.
Researcher Affiliation	Collaboration	Yizhe Zhu Rutgers University yizhe.zhu@rutgers.edu, Jianwen Xie Hikvision Research Institute jianwen@ucla.edu, Zhiqiang Tang Rutgers University zhiqiang.tang@rutgers.edu, Xi Peng University of Delaware xipeng@udel.edu, Ahmed Elgammal Rutgers University elgammal@cs.rutgers.edu
Pseudocode	No	The paper includes mathematical equations and descriptions of the model, but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about open-sourcing code or a link to a code repository for the described methodology.
Open Datasets	Yes	We use three widely used zero-shot learning datasets: Caltech-UCSD-Birds 200-2011 (CUB) [37], Oxford Flowers (FLO) [38], Animals with Attributes (Aw A) [22].
Dataset Splits	Yes	Hyper-parameters in our models are obtained by grid search on the validation set.
Hardware Specification	Yes	We use the SGD optimizer with the learning rate of 0.05, the momentum of 0.9, and weight decay of 5 10 4 to optimize the objective functions. The learning rate is decay by 0.1 on the plateau, and the minimum one is set to be 5 10 4. Hyper-parameters in our models are obtained by grid search on the validation set. mrgs in Eq. 7 and Eq. 10 are set to be 0.2 and 0.8, respectively. k in Eq. 8 is set to be 10. The number of parts is set to be 2 since we ﬁnd that increasing the number of parts will result in little improvement on the zero-shot learning performance and lead to attention redundancy, i.e., maps attend to the same region.
Software Dependencies	No	We implement our approach on the Pytorch Framework. No specific version number for Pytorch or other software dependencies is provided.
Experiment Setup	Yes	We implement our approach on the Pytorch Framework. For the multi-attention subnet, we take the images of size 448 448 as input in order to achieve high-resolution attention maps. For the joint feature embedding subnet, we resize all the input images to the size of 224 224. We consistently adopt VGG19 as the backbone and train the model with a batch size of 32 on two GPUs (Titan X). We use the SGD optimizer with the learning rate of 0.05, the momentum of 0.9, and weight decay of 5 10 4 to optimize the objective functions. The learning rate is decay by 0.1 on the plateau, and the minimum one is set to be 5 10 4. Hyper-parameters in our models are obtained by grid search on the validation set. mrgs in Eq. 7 and Eq. 10 are set to be 0.2 and 0.8, respectively. k in Eq. 8 is set to be 10. The number of parts is set to be 2 since we ﬁnd that increasing the number of parts will result in little improvement on the zero-shot learning performance and lead to attention redundancy, i.e., maps attend to the same region.