Domain-Controlled Prompt Learning
Authors: Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show our method achieves state-of-the-art performance in specific domain image recognition datasets. Our method is extensively evaluated on specific domain datasets. The experimental results demonstrate our method achieves state-of-the-art performance. |
| Researcher Affiliation | Academia | Qinglong Cao1,2 , Zhengqin Xu1, Yuntian Chen2*, Chao Ma1, Xiaokang Yang1 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo {caoql2022, fate311}@sjtu.edu.cn, ychen@eitech.edu.cn, {chaoma, xkyang}@sjtu.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/caoql98/DCPL. |
| Open Datasets | Yes | The proposed method was evaluated on eight remote sensing datasets, namely MLRSNet (Qi et al. 2020), Pattern Net (Zhou et al. 2018), RSSCN7 (Zou et al. 2015), AID (Xia et al. 2017), RSICD (Lu et al. 2017), UCM (Yang and Newsam 2010), WHURS19 (Dai and Yang 2011), and NWPU (Cheng, Han, and Lu 2017). |
| Dataset Splits | Yes | All experiments were conducted using a few-shot training strategy with 16 shots, randomly sampled for each class. For the base-to-novel generalization setting, experiments were conducted on all eight remote sensing datasets. In the cross-dataset generalization and domain generalization settings, MLRSNet was used as the source dataset, while the remaining datasets served as the target datasets. |
| Hardware Specification | Yes | We utilized the SGD optimizer and trained models on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions using a 'Pre-trained Vi T-B/16 CLIP model' and describes network architectures (e.g., 'two linear layers followed by a Re LU activation layer'), but it does not specify version numbers for any software dependencies like Python, PyTorch, or specific libraries. |
| Experiment Setup | Yes | All experiments were conducted using a few-shot training strategy with 16 shots, randomly sampled for each class. Pre-trained Vi T-B/16 CLIP model is used as the basis for prompt tuning. The training process for all models lasted for 5 epochs, employing a batch size of 4 and a learning rate of 0.0035. We utilized the SGD optimizer and trained models on a single NVIDIA A100 GPU. The template for the word embeddings is 'a photo of category'. We kept the hyperparameters consistent across all datasets to ensure fair comparisons. The language and visual control networks were implemented as two independent networks with the same architecture. Each network consisted of two linear layers followed by a Re LU activation layer. |