PLOT: Prompt Learning with Optimal Transport for Vision-Language Models

Authors: Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, Kun Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on the few-shot recognition task and the improvement demonstrates the superiority of our method.
Researcher Affiliation Academia Carnegie Mellon University, Pittsburgh PA, USA Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE Tsinghua University, Beijing, China New York University, Abu Dhabi, UAE
Pseudocode Yes The detailed algorithms of the training and testing processes are shown in Algorithms A1 and A2
Open Source Code Yes The code is available at https://github.com/CHENGY12/PLOT.
Open Datasets Yes We followed the experimental settings in the Co Op (Zhou et al., 2021b) for the few-shot learning evaluation. The experiments are conducted on the 11 visual recognition datasets, including Caltech101 (Fei-Fei et al., 2004), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), FGVCAircraft (Maji et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), Image Net (Deng et al., 2009), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), SUN397 (Xiao et al., 2010), and UCF101 (Soomro et al., 2012).
Dataset Splits Yes All experiments adopted the few-shot evaluation protocol used in CLIP (Radford et al., 2021) and Co Op (Zhou et al., 2021b), where we respectively choose 1, 2, 4, 8, and 16 shots for model training and use the original test set for evaluation. Table A1: The detailed statistics of datasets used in experiments. Dataset Classes Training size Testing size Task
Hardware Specification Yes All models are conducted on the Pytorch (Paszke et al., 2019) 1.7.1 and trained on 4 NVIDIA A100 GPUs.
Software Dependencies Yes All models are conducted on the Pytorch (Paszke et al., 2019) 1.7.1 and trained on 4 NVIDIA A100 GPUs.
Experiment Setup Yes More implementation details can be found in Section A2. As the settings in the Co Co Op and Co Op are different, we re-run the Co Co Op method in the setting of Co Op. We observed that all prompt learning methods outperform the linear probe method by a large margin. (from A2.2) the length of learnable context tokens is set as 16. RN50 (He et al., 2016) as the backbone network. SGD optimizer with 0.002 initial learning rate, Cosine Annealing LR schedule, and a warmup trick with 1e-5 learning rate. For small datasets such as FGVCAircraft, Oxford Flowers, and Stanford Cars, the batch size is set as 32, while for the larger dataset such as Imagenet and SUN397, the batch size is set as 128. We apply N = 4 prompts for each category and use M = 7 7 due to the feature map size. We set the hyper-parameters in the Sinkhorn distances algorithm (Cuturi, 2013) as λ = 0.1 for all the datasets. We set the maximum iteration number of the inner loop as 100 and will early stop the iteration when the average absolute update value Λ < 0.01.