HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

Authors: Shiming Chen, Guosen Xie, Yang Liu, Qinmu Peng, Baigui Sun, Hao Li, Xinge You, Ling Shao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four benchmark datasets demonstrate that HSVA achieves superior performance on both conventional and generalized ZSL.
Researcher Affiliation Collaboration Huazhong University of Science and Technology (HUST), China Alibaba Group, Hangzhou, China Mohamed bin Zayed University of AI (MBZUAI), UAE Inception Institute of Artificial Intelligence (IIAI), UAE
Pseudocode No The paper describes the method using equations and text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/shiming-chen/HSVA .
Open Datasets Yes We conduct extensive experiments on four well-known ZSL benchmark datasets, including fine-grained datasets (e.g., CUB [43] and SUN [44]) and coarse-grained datasets (e.g., AWA1 [4] and AWA2 [45]).
Dataset Splits Yes We use the training splits proposed in [46]. In the CZSL setting, we synthesize 800, 400 and 200 features per unseen class to train the classifier for AWA1, CUB and SUN datasets, respectively. In the GZSL setting, we take 400 synthesized features per unseen class and 200 synthesized features per seen class to train the classifier for all datasets.
Hardware Specification No The paper mentions a CNN backbone (Res Net-101) but does not provide specific details on the hardware (e.g., GPU model, CPU type) used for experiments.
Software Dependencies No The paper mentions the Adam optimizer and ResNet-101 but does not specify any software versions for libraries, frameworks, or programming languages used in the implementation.
Experiment Setup Yes We employ the Adam optimizer [47] with β1 = 0.5 and β2 = 0.999. We use an annealing scheme [48] to increase the weights γ, λ1, λ2, λ3 with same setting for all datasets. Specifically, γ is increased by a rate of 0.0026 per epoch until epoch 90, λ1 is increased from epoch 21 to 75 by 0.044 per epoch, and λ2, λ3 are increased from epoch 0 to epoch 22 by a rate of 0.54 per epoch. The dimensions of the structure-aligned and distribution-aligned spaces are set to 2048 and 64 for three datasets (i.e., CUB, AWA1, AWA2), respectively, and 2048 and 128 for SUN benchmark.