LAMM: Label Alignment for Multi-Modal Prompt Learning
Authors: Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang, Zefang Yu, Ke Ji, Mingye Xie, Ting Liu, Yuzhuo Fu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on 11 downstream vision datasets and demonstrate that our method significantly improves the performance of existing multi-modal prompt learning models in few-shot scenarios, exhibiting an average accuracy improvement of 2.31(%) compared to the state-of-the-art methods on 16 shots. |
| Researcher Affiliation | Academia | 1 School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China 2 School of Biomedical Engineering, Shanghai Jiao Tong University, China 3 School of Computer Science and Engineering, Southeast University, China |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and dataset will be publicly available at https://github.com/gaojingsheng/LAMM. |
| Open Datasets | Yes | We follow the datasets used in previous works (Zhou et al. 2022b; Khattak et al. 2022) and evaluate our method on 11 image classification datasets, including Caltech101 (Fei-Fei, Fergus, and Perona 2007), Image Net (Deng et al. 2009), Oxford Pets (Parkhi et al. 2012), Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Gool 2014), FGVC (Maji et al. 2013), SUN397 (Xiao et al. 2010), UCF101 (Soomro, Zamir, and Shah 2012), DTD (Cimpoi et al. 2014) and Euro SAT (Helber et al. 2019). |
| Dataset Splits | No | Specifically, we use 1, 2, 4, 8, and 16 shots for training respectively, and evaluate the models on the full test sets. All experimental results are the average of the results obtained from running the experiments on seeds 1, 2 and 3. The paper specifies training shots and evaluation on full test sets, but does not explicitly mention a separate validation split or how hyperparameters were tuned if not on a validation set. |
| Hardware Specification | Yes | All of our experiments are conducted on a single NVIDIA A100. |
| Software Dependencies | No | The paper mentions models like CLIP, Co Op, and Ma PLe and that the training parameters were kept the same as their original settings, but it does not specify software dependencies like programming language or library versions (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | We keep the same training parameters (e.g., learning rate, epochs, and other prompt parameters) of each model in their original settings, where the epoch of Co Op is 50 and Ma PLe is 5. As for vanilla CLIP + LAMM, we follow the settings of Co Op. All of our experiments are conducted on a single NVIDIA A100. The corresponding hyper-parameters are fixed across all datasets in our work. |