Isometric Propagation Network for Generalized Zero-shot Learning

Authors: Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Xuanyi Dong, Chengqi Zhang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate IPN and compare it with several state-of-the-art ZSL models in both ZSL setting (where the test set is only composed of unseen classes) and generalized ZSL setting (where the test set is composed of both seen and unseen classes). We report three standard ZSL evaluation metrics, and provide ablation study of several variants of IPN in the generalized ZSL setting.
Researcher Affiliation Academia Lu Liu , Tianyi Zhou , Guodong Long , Jing Jiang , Xuanyi Dong , Chengqi Zhang University of Technology Sydney, University of Washington
Pseudocode No The paper describes the procedure and equations for IPN, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not contain any explicit statements about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets Yes Our evaluation includes experiments on three standard benchmark datasets widely used in previous works, i.e., AWA2 (Xian et al., 2019a), CUB (Welinder et al., 2010) and a PY (Farhadi et al., 2009). ... we proposed two new large-scale datasets extracted from tiered Image Net (Ren et al., 2018), i.e., tiered Image Net Segregated and tiered Image Net-Mixed
Dataset Splits Yes In each episode, we sample a subset of training classes YT Yseen, and then build a training (or support) set DT train and a validation (or query) set DT valid from the training samples of YT . They are two disjoint sets of samples from the same subset of training classes. We only use DT train to generate the prototypes P by the aforementioned procedures, and then apply backpropagation to minimize the loss of DT valid. ... All hyperparameters were chosen on the validation sets provided by Xian et al. (2019a).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies No For a fair comparison, we follow the setting in (Xian et al., 2019a) and use a pre-trained Res Net101 (He et al., 2016) to extract 2048-dimensional image features without fine-tuning. ... For the rest of the datasets, we train IPN by Adam (Kingma & Ba, 2015) for 360 epochs... The class attributes are the word embeddings of the class names provided by Glo Ve (Pennington et al., 2014). We show the ablation study of each component in our model and visualize the final learned representation in each space in Figure 1. It shows the class representations in each space learned by IPN are evenly distributed and connected smoothly between the seen and unseen classes. The two spaces can also reflect each other. Figure 1: t-SNE (Maaten & Hinton, 2008) of the prototypes produced by IPN on AWA2. Blue/red points represent unseen/seen classes.
Experiment Setup Yes All hyperparameters were chosen on the validation sets provided by Xian et al. (2019a). We use the same hyperparameters tuned on AWA2 for other datasets since they are of similar data type, except for a PY on which we choose learning rate 1.0 10 3 and weight decay 5.0 10 4. For the rest of the datasets, we train IPN by Adam (Kingma & Ba, 2015) for 360 epochs with a weight decay factor 1.0 10 4. The learning rate starts from 2.0 10 5 and decays by a multiplicative factor 0.1 every 240 epochs. In each epoch, we update IPN on multiple N-way-K-shot tasks (N-classes with K samples per class), each corresponding to an episode mentioned in Section 4.5. In our experiments, the number of episodes in each epoch is n/NK, where n is the total size of the training set, N = 30 and K = 1 (a small K can mitigate the imbalance between seen and unseen classes); hv and hs are linear transformations; we set temperature γ = 10, threshold ϵ = cos 40o for the cosine similarities between class prototypes, and weight for consistency loss λ = 1. We use propagation steps τ = 2.