Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
Authors: Yizhe Zhu, Jianwen Xie, Zhiqiang Tang, Xi Peng, Ahmed Elgammal
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments on three widely used zero-shot learning benchmarks, we show the efficacy of the multi-attention localization and our proposed approach improves the state-of-the-art results by a considerable margin. |
| Researcher Affiliation | Collaboration | Yizhe Zhu Rutgers University yizhe.zhu@rutgers.edu, Jianwen Xie Hikvision Research Institute jianwen@ucla.edu, Zhiqiang Tang Rutgers University zhiqiang.tang@rutgers.edu, Xi Peng University of Delaware xipeng@udel.edu, Ahmed Elgammal Rutgers University elgammal@cs.rutgers.edu |
| Pseudocode | No | The paper includes mathematical equations and descriptions of the model, but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about open-sourcing code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use three widely used zero-shot learning datasets: Caltech-UCSD-Birds 200-2011 (CUB) [37], Oxford Flowers (FLO) [38], Animals with Attributes (Aw A) [22]. |
| Dataset Splits | Yes | Hyper-parameters in our models are obtained by grid search on the validation set. |
| Hardware Specification | Yes | We use the SGD optimizer with the learning rate of 0.05, the momentum of 0.9, and weight decay of 5 10 4 to optimize the objective functions. The learning rate is decay by 0.1 on the plateau, and the minimum one is set to be 5 10 4. Hyper-parameters in our models are obtained by grid search on the validation set. mrgs in Eq. 7 and Eq. 10 are set to be 0.2 and 0.8, respectively. k in Eq. 8 is set to be 10. The number of parts is set to be 2 since we find that increasing the number of parts will result in little improvement on the zero-shot learning performance and lead to attention redundancy, i.e., maps attend to the same region. |
| Software Dependencies | No | We implement our approach on the Pytorch Framework. No specific version number for Pytorch or other software dependencies is provided. |
| Experiment Setup | Yes | We implement our approach on the Pytorch Framework. For the multi-attention subnet, we take the images of size 448 448 as input in order to achieve high-resolution attention maps. For the joint feature embedding subnet, we resize all the input images to the size of 224 224. We consistently adopt VGG19 as the backbone and train the model with a batch size of 32 on two GPUs (Titan X). We use the SGD optimizer with the learning rate of 0.05, the momentum of 0.9, and weight decay of 5 10 4 to optimize the objective functions. The learning rate is decay by 0.1 on the plateau, and the minimum one is set to be 5 10 4. Hyper-parameters in our models are obtained by grid search on the validation set. mrgs in Eq. 7 and Eq. 10 are set to be 0.2 and 0.8, respectively. k in Eq. 8 is set to be 10. The number of parts is set to be 2 since we find that increasing the number of parts will result in little improvement on the zero-shot learning performance and lead to attention redundancy, i.e., maps attend to the same region. |