Domain-Controlled Prompt Learning

Authors: Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show our method achieves state-of-the-art performance in specific domain image recognition datasets. Our method is extensively evaluated on specific domain datasets. The experimental results demonstrate our method achieves state-of-the-art performance.
Researcher Affiliation Academia Qinglong Cao1,2 , Zhengqin Xu1, Yuntian Chen2*, Chao Ma1, Xiaokang Yang1 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo {caoql2022, fate311}@sjtu.edu.cn, ychen@eitech.edu.cn, {chaoma, xkyang}@sjtu.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/caoql98/DCPL.
Open Datasets Yes The proposed method was evaluated on eight remote sensing datasets, namely MLRSNet (Qi et al. 2020), Pattern Net (Zhou et al. 2018), RSSCN7 (Zou et al. 2015), AID (Xia et al. 2017), RSICD (Lu et al. 2017), UCM (Yang and Newsam 2010), WHURS19 (Dai and Yang 2011), and NWPU (Cheng, Han, and Lu 2017).
Dataset Splits Yes All experiments were conducted using a few-shot training strategy with 16 shots, randomly sampled for each class. For the base-to-novel generalization setting, experiments were conducted on all eight remote sensing datasets. In the cross-dataset generalization and domain generalization settings, MLRSNet was used as the source dataset, while the remaining datasets served as the target datasets.
Hardware Specification Yes We utilized the SGD optimizer and trained models on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions using a 'Pre-trained Vi T-B/16 CLIP model' and describes network architectures (e.g., 'two linear layers followed by a Re LU activation layer'), but it does not specify version numbers for any software dependencies like Python, PyTorch, or specific libraries.
Experiment Setup Yes All experiments were conducted using a few-shot training strategy with 16 shots, randomly sampled for each class. Pre-trained Vi T-B/16 CLIP model is used as the basis for prompt tuning. The training process for all models lasted for 5 epochs, employing a batch size of 4 and a learning rate of 0.0035. We utilized the SGD optimizer and trained models on a single NVIDIA A100 GPU. The template for the word embeddings is 'a photo of category'. We kept the hyperparameters consistent across all datasets to ensure fair comparisons. The language and visual control networks were implemented as two independent networks with the same architecture. Each network consisted of two linear layers followed by a Re LU activation layer.