DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time Adaptation

Authors: Zichen Liu, Hongbo Sun, Yuxin Peng, Jiahuan Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various large-scale benchmarks demonstrate the effectiveness of our proposed DART against state-of-the-art methods.
Researcher Affiliation Academia Wangxuan Institute of Computer Technology, Peking University lzc20180720@stu.pku.edu.cn, {sunhongbo, pengyuxin, jiahuanzhou}@pku.edu.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Since the data distribution shifting will inevitably occur in real-world scenarios, the experiments are conducted on three large-scale benchmarks, Image Net-A (Hendrycks et al. 2021b), Image Net-R (Hendrycks et al. 2021a), and Image Net-Sketch (Wang et al. 2019) which are variants of the Image Net (Deng et al. 2009) dataset to evaluate the performance of different methods for improving the test-time generalization ability of CLIP.
Dataset Splits No The paper describes processing of individual test samples in an online manner and mentions some few-shot methods use '16-shot extra training images,' but it does not specify explicit overall dataset splits (e.g., percentages or counts) for training, validation, and testing that would be needed to reproduce the experimental setup from scratch for the entire dataset.
Hardware Specification Yes All experiments are implemented on a single NVIDIA 4090 GPU.
Software Dependencies No The paper mentions software components such as CLIP, Vi T-B/16, and an Adam optimizer, but it does not provide specific version numbers for these or other underlying software libraries/frameworks (e.g., Python, PyTorch, CUDA) that would be necessary for exact reproduction.
Experiment Setup Yes For each test image, we initialize all the text prompts in our DART as a photo of a . The image prompts are initialized with a uniform distribution of ( 1, 1) following the previous visual prompting methods (Wang et al. 2022e,d). The length of image prompts is set to 2, and they are added to the second layer of the CLIP image encoder. The hyper-parameters h, w T , and w I of dual-modal knowledge retention prompts are set to 5000, 0.1, and 0.1 respectively. For the learning of DART, we use randomly resized crops to augment the single test sample to obtain a batch of B = 64 images, and the confidence threshold ρ follows the same setting in (Shu et al. 2022). An Adam optimizer with a learning rate of 0.003 is used to optimize the prompts P.