Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

Authors: Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Tat-Seng Chua, Yao Zhao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three benchmarks show that our AENet outperforms existing state-of-the-art ZSL methods.
Researcher Affiliation Academia Man Liu2,4, Huihui Bai1,2,4B, Feng Li3B, Chunjie Zhang2,4, Yunchao Wei2,4, Tat-Seng Chua5, Yao Zhao2,4 1 Tangshan Research Institute of Beijing Jiaotong University 2Institute of Information Science, Beijing Jiaotong University 3 Hefei University of Technology 4 Beijing Key Laboratory of Advanced Information Science and Network Technology 5 National University of Singapore
Pseudocode No The paper describes the methodology using textual explanations, mathematical equations, and a visual flowchart in Figure 2. It does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes Code https://github.com/Man Liu Coder/AENet.
Open Datasets Yes We conduct experiments on three standard benchmark datasets: Caltech-UCSD Birds-200-2011 (CUB) (Welinder et al. 2010), SUN Attribute (SUN) (Patterson and Hays 2012), Animals with Attributes2 (Aw A2) (Xian et al. 2019).
Dataset Splits Yes The categorization into seen and unseen categories follows the Proposed Split (PS) (Xian et al. 2019). The CUB dataset consists of 11,788 images illustrating 200 bird classes, with a split of 150/50 for seen/unseen classes... SUN is a vast scene dataset that contains 14,340 images spanning 717 classes, divided into seen/unseen classes at 645/72... Aw A2 contains 37,322 images of 50 animal classes, with a 40/10 split for seen/unseen classes...
Hardware Specification Yes Our framework is implemented using Py Torch and executed on an NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies No Our framework is implemented using Py Torch and executed on an NVIDIA Ge Force RTX 3090 GPU. The paper mentions PyTorch but does not specify a version number or other software dependencies with versions.
Experiment Setup Yes The input image resolution is 224 224, with a patch size of 16 16. ... We sweep prompt length T {1, 3, 5, 7, 9} to investigate the effect of the prompts P on classification performance. ...we set T = 5 for CUB, SUN, and Aw A2 datasets. ...λcons and λdeb are the hyper-parameters controlling the weights of semantic consistency loss Lcons and the debiasing loss Ldeb, respectively. ... The best H is obtained when λcons = 1.0. ... Thus, we set λcons = 1.0 for optimal results.