DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Authors: Zhi Zhou, Ming Yang, Jiang-Xin Shi, Lan-Zhe Guo, Yu-Feng Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on 11 benchmark datasets validate the effectiveness of DEPT and demonstrate that DECOOP outperforms state-of-the-art methods, providing a significant 2% average accuracy improvement.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3School of Intelligence Science and Technology, Nanjing University, China.
Pseudocode No The paper describes the proposed methods (New-class Detector MD, Sub-Classifier MC, Inference) in prose and mathematical equations but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes The implementation code for this work is available at https://github.com/WNJXYK/DeCoOp.
Open Datasets Yes We conducted evaluations of our proposed DECOOP framework along with comparison methods on various image classification tasks. These tasks included general object recognition using Image Net (Deng et al., 2009) and Caltech-101 (Fei-Fei et al., 2007) datasets, fine-grained object recognition involving datasets such as Oxford Pets (Krause et al., 2013), Food-101 (Bossard et al., 2014), Stanford Cars (Krause et al., 2013), Oxford Flowers 102 (Nilsback & Zisserman, 2008), and FGVC Aircraft (Maji et al., 2013). Additionally, we performed a remote sensing recognition task using the Euro SAT (Helber et al., 2019) dataset, a texture recognition task using the DTD (Cimpoi et al., 2014) dataset, an action recognition task using UCF101 (Soomro et al., 2012) dataset and a large-scale scene understanding task using SUN397 (Xiao et al., 2010) dataset.
Dataset Splits Yes This setting involves partitioning the class space of each dataset equally, with 50% of the classes designated as base classes and the remaining 50% as new classes. Consequently, for each dataset, prompts are learned for downstream tasks using 16 labeled samples per base class, drawn from the training set. The efficacy of these learned prompts is subsequently evaluated on the entire testing set, encompassing both base and new classes.
Hardware Specification Yes All experiments were conducted on Linux servers equipped with NVIDIA A800 GPUs.
Software Dependencies No The paper mentions using an 'SGD optimizer' and setting a 'learning rate lr' and 'cosine decay schedule', but does not specify software library names with version numbers (e.g., PyTorch, TensorFlow, CUDA versions) that would be needed for replication.
Experiment Setup Yes The number of tokens in each prompt is set to 16 for DECOOP approach and comparison methods. We train the prompts of new-class detectors for 50 epochs using the SGD optimizer and subsequently train the prompts for sub-classifiers for 100 epochs, also using the SGD optimizer. The learning rate lr is set to 0.002, and it follows a cosine decay schedule. The margin γ is set to 0.4 for all datasets. ... The batch size for images is 32 across all datasets. ... We report the average results over 5 runs with different random seed {1, 2, 3, 4, 5}.