Prompt Learning with Quaternion Networks
Authors: Boya Shi, Zhengqin Xu, Shuai Jia, Chao Ma
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on 11 datasets demonstrate that QNet outperforms state-of-the-art prompt learning techniques in base-to-novel generalization, crossdataset transfer, and domain transfer scenarios with fewer learnable parameters. |
| Researcher Affiliation | Academia | Boya Shi1,2, Zhengqin Xu1, Shuai Jia1, Chao Ma1 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2 National Innovation Institute of Defense Technology {boya.shi, fate311, jiashuai, chaoma}@sjtu.edu.cn |
| Pseudocode | No | No clearly labeled 'Pseudocode' or 'Algorithm' block was found. The methodology is described using text and mathematical equations. |
| Open Source Code | Yes | The source code is available at https://github.com/SHIBOYA/QNet. |
| Open Datasets | Yes | We follow Zhou et al. (2022b) by using 11 image recognition datasets that cover various tasks. Concretely, we include Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004) for generic object classification, Oxfordpets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and Aircraft (Maji et al., 2013) for fine-grained classification, SUN397 (Xiao et al., 2010) for scene recognition, UCF101 (Soomro et al., 2012) for action recognition, DTD (Cimpoi et al., 2014) for texture recognition, and Euro SAT (Helber et al., 2019) for satellite image recognition. |
| Dataset Splits | Yes | We evaluate our method in three scenarios: 1) Base-to-novel generalization, generalizing from base classes to new classes within a dataset; 2) Cross-dataset evaluation, transferring across different datasets, and 3) Domain generalization, transferring on four variant datasets of Image Net. [...] To maintain robust results, we validate our method using 16 shots and report the average results over three runs. |
| Hardware Specification | Yes | We train QNet for 7 epochs with a batch size of 1 on a single NVIDIA RTX 8000 GPU. |
| Software Dependencies | No | No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries) were provided in the paper. It mentions 'prompt-tune a pre-trained Vi T-B/16 CLIP model' and using 'pre-trained word embeddings' but no software versions. |
| Experiment Setup | Yes | For the training of QNet, we prompt-tune a pre-trained Vi T-B/16 CLIP model and set prompt depth L to 7 and language and vision prompt lengths to 2. We train QNet for 7 epochs with a batch size of 1 on a single NVIDIA RTX 8000 GPU. |