Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification

Authors: Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng Yin6605-6613

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method.
Researcher Affiliation Academia Zhiyu Fang, Xiaobin Zhu*, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng Yin School of Computer & Communication Engineering, University of Science and Technology Beijing, Beijing, China
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes the method using figures and mathematical equations.
Open Source Code No The paper does not provide any specific statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct experiments on four benchmark datasets: CUB (Wah et al. 2011), SUN (Patterson and Hays 2012), Aw A1 (Lampert, Nickisch, and Harmeling 2009), and Aw A2 (Xian et al. 2018a). The detailed information of the datasets is summarized in Table 1.
Dataset Splits Yes Consequently, we can individually collect training dataset Ds={xs i, as i, ys i }N i=1 and testing dataset Du={xu i , au i, yu i }M i=1, where xs i, xu i X is the i-th visual feature, as i, au i A is the i-th semantic feature, and ys i , yu i Y is their corresponding labels, of seen/unseen classes, respectively. The task of ZSL aims to learn a classifier f ZSL : X Y U for recognizing a testing instance x of unseen classes. For adapting to both seen and unseen classes, GZSL adopts a more realistic setting and learn a classifier f GZSL : X Y U Y S. For a fair comparison, we adopt the setting as in (Xian et al. 2018a) for training and testing.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments. It only mentions that ResNet-101 pre-trained on ImageNet was used for feature extraction and that the approach was implemented with PyTorch.
Software Dependencies Yes Our approach is implemented with Py Torch 1.5.0 and trained for 100 epochs by the Adam optimizer (Kingma and Ba 2015).
Experiment Setup Yes We set learning rate as 1.5e-04 for training VAEs, 3.3e-05 for training IEMs, 7.4e-03 for training VSA, 0.5e-03 for training softmax classifier. For all datasets, the batch size of ACMR is set to 50 and the batch size of final softmax classifier is set to 32. Our approach is implemented with Py Torch 1.5.0 and trained for 100 epochs by the Adam optimizer (Kingma and Ba 2015). We set learning rate as 1.5e-04 for training VAEs, 3.3e-05 for training IEMs, 7.4e-03 for training VSA, 0.5e-03 for training softmax classifier. For all datasets, the batch size of ACMR is set to 50 and the batch size of final softmax classifier is set to 32. The size of our aligned cross-modal representation in latent space is 64 for all datasets.