Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning

Authors: Chaoqun Wang, Xuejin Chen, Shaobo Min, Xiaoyan Sun, Houqiang Li2710-2718

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our DCEN achieves superior performance on four GZSL benchmarks. In this section, we first evaluate each component of DCEN and then compare DCEN with state-of-the-art GZSL methods. Results are given in Table 4, which shows that DCEN surpasses existing methods on four datasets by a large margin.
Researcher Affiliation Academia Chaoqun Wang1 2, Xuejin Chen*1 2, Shaobo Min2, Xiaoyan Sun2, Houqiang Li1 2 1School of Data Science 2The National Engineering Laboratory for Brain-inspired Intelligence Technology and Application University of Science and Technology of China, Hefei, Anhui, China cq14@mail.ustc.edu.cn, xjchen99@ustc.edu.cn, mbobo@mail.ustc.edu.cn, {sunxiaoyan, lihq}@ustc.edu.cn
Pseudocode No The paper provides architectural diagrams (Figure 2 and Figure 3) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We adopt four widely-used GZSL benchmarks, which are Caltech-USCD Birds-200-2011 (CUB) (Wah et al. 2011), SUN (Patterson and Hays 2012), Animals with Attributes2 (AWA2) (Xian et al. 2018a), and Attribute Pascal and Yahoo (a PY) (Farhadi et al. 2009) for the following experiments.
Dataset Splits Yes Table 1: Detailed statistics of datasets. Dataset | Seen/Unseen | Attributes | Train | Val | Test ---|---|---|---|---|--- CUB | 150/50 | 312 | 7,057 | 1,764 | 2,967 AWA2 | 40/10 | 85 | 23,527 | 5,882 | 7,913 a PY | 20/12 | 64 | 5,932 | 1,483 | 7,924 SUN | 645/72 | 102 | 10,320 | 2,580 | 1,440
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions using CNN models and refers to supplementary material for implementation details, but it does not specify software names with version numbers in the main text.
Experiment Setup Yes The overall objective function of DCEN is : Lall = λ1Lid + Lsa + λ2Lsp, where λ1 and λ2 are hyper-parameters to balance Lid and Lsp. We find λ1 = 0.1 is suitable for most cases, which is used for the following experiments. In DCEN, K, τ, and m are three minor hyper-parameters. K determines the architecture h( ) in Eq. (3). Here, we evaluate the effects of different K in Fig. 6 (b). It can be seen that K = 2 is suitable for most cases. τ and m control the cosine similarity scaling in Eq. (5) and momentum updating of g( ) in Eq. (6). In this paper, we set τ = 0.07 and m = 0.999, which are commonly used in contrastive learning (He et al. 2020).