DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning
Authors: Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Huajun Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, and that its components are effective and its predictions are interpretable. |
| Researcher Affiliation | Collaboration | Zhuo Chen1, 2, 6, Yufeng Huang3, 6, Jiaoyan Chen4, Yuxia Geng1, 6, Wen Zhang3, 6, Yin Fang1, 6, Jeff Z. Pan5, Huajun Chen1, 2, 6* 1College of Computer Science and Technology, Zhejiang University 2Donghai Laboratory, Zhoushan 316021, China 3School of Software Technology, Zhejiang University 4Department of Computer Science, The University of Manchester 5School of Informatics, The University of Edinburgh 6Alibaba-Zhejiang University Joint Institute of Frontier Technologies |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block found in the paper. |
| Open Source Code | Yes | Our code is available at https://github.com/zjukg/DUET. |
| Open Datasets | Yes | We select three standard attribute equipped ZSL benchmarks AWA2 (Xian et al. 2019), CUB (Welinder et al. 2010), SUN (Patterson and Hays 2012) with their splits proposed in (Xian et al. 2019), as well as a knowledge graph (KG) equipped benchmark AWA2-KG which has the same split as AWA2 but includes semantic information about hierarchical classes and attributes, for evaluation. |
| Dataset Splits | Yes | We select three standard attribute equipped ZSL benchmarks AWA2 (Xian et al. 2019), CUB (Welinder et al. 2010), SUN (Patterson and Hays 2012) with their splits proposed in (Xian et al. 2019) and "γ is the calibration factor tuned on a held-out validation set". |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or memory specifications) were found. The paper only mentions 'Vi T-base as the vision encoder' which is a software model. |
| Software Dependencies | No | The paper mentions software components like 'pre-trained language models (PLMs)', 'vision transformer', and 'Res Net', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For those coefficients in AWA2, we set λar to 0.01, λcon to 0.05, λcmr to 1, λacl to 0.01, rrap to 0.5, ρ to 0.4 and γ to 0.8. |