Prompt Learning with Quaternion Networks

Authors: Boya Shi, Zhengqin Xu, Shuai Jia, Chao Ma

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on 11 datasets demonstrate that QNet outperforms state-of-the-art prompt learning techniques in base-to-novel generalization, crossdataset transfer, and domain transfer scenarios with fewer learnable parameters.
Researcher Affiliation Academia Boya Shi1,2, Zhengqin Xu1, Shuai Jia1, Chao Ma1 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2 National Innovation Institute of Defense Technology {boya.shi, fate311, jiashuai, chaoma}@sjtu.edu.cn
Pseudocode No No clearly labeled 'Pseudocode' or 'Algorithm' block was found. The methodology is described using text and mathematical equations.
Open Source Code Yes The source code is available at https://github.com/SHIBOYA/QNet.
Open Datasets Yes We follow Zhou et al. (2022b) by using 11 image recognition datasets that cover various tasks. Concretely, we include Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004) for generic object classification, Oxfordpets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and Aircraft (Maji et al., 2013) for fine-grained classification, SUN397 (Xiao et al., 2010) for scene recognition, UCF101 (Soomro et al., 2012) for action recognition, DTD (Cimpoi et al., 2014) for texture recognition, and Euro SAT (Helber et al., 2019) for satellite image recognition.
Dataset Splits Yes We evaluate our method in three scenarios: 1) Base-to-novel generalization, generalizing from base classes to new classes within a dataset; 2) Cross-dataset evaluation, transferring across different datasets, and 3) Domain generalization, transferring on four variant datasets of Image Net. [...] To maintain robust results, we validate our method using 16 shots and report the average results over three runs.
Hardware Specification Yes We train QNet for 7 epochs with a batch size of 1 on a single NVIDIA RTX 8000 GPU.
Software Dependencies No No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries) were provided in the paper. It mentions 'prompt-tune a pre-trained Vi T-B/16 CLIP model' and using 'pre-trained word embeddings' but no software versions.
Experiment Setup Yes For the training of QNet, we prompt-tune a pre-trained Vi T-B/16 CLIP model and set prompt depth L to 7 and language and vision prompt lengths to 2. We train QNet for 7 epochs with a batch size of 1 on a single NVIDIA RTX 8000 GPU.