reproducibilityindex.ai

DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time Adaptation

Authors: Zichen Liu, Hongbo Sun, Yuxin Peng, Jiahuan Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various large-scale benchmarks demonstrate the effectiveness of our proposed DART against state-of-the-art methods.
Researcher Affiliation	Academia	Wangxuan Institute of Computer Technology, Peking University lzc20180720@stu.pku.edu.cn, {sunhongbo, pengyuxin, jiahuanzhou}@pku.edu.cn
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Since the data distribution shifting will inevitably occur in real-world scenarios, the experiments are conducted on three large-scale benchmarks, Image Net-A (Hendrycks et al. 2021b), Image Net-R (Hendrycks et al. 2021a), and Image Net-Sketch (Wang et al. 2019) which are variants of the Image Net (Deng et al. 2009) dataset to evaluate the performance of different methods for improving the test-time generalization ability of CLIP.
Dataset Splits	No	The paper describes processing of individual test samples in an online manner and mentions some few-shot methods use '16-shot extra training images,' but it does not specify explicit overall dataset splits (e.g., percentages or counts) for training, validation, and testing that would be needed to reproduce the experimental setup from scratch for the entire dataset.
Hardware Specification	Yes	All experiments are implemented on a single NVIDIA 4090 GPU.
Software Dependencies	No	The paper mentions software components such as CLIP, Vi T-B/16, and an Adam optimizer, but it does not provide specific version numbers for these or other underlying software libraries/frameworks (e.g., Python, PyTorch, CUDA) that would be necessary for exact reproduction.
Experiment Setup	Yes	For each test image, we initialize all the text prompts in our DART as a photo of a . The image prompts are initialized with a uniform distribution of ( 1, 1) following the previous visual prompting methods (Wang et al. 2022e,d). The length of image prompts is set to 2, and they are added to the second layer of the CLIP image encoder. The hyper-parameters h, w T , and w I of dual-modal knowledge retention prompts are set to 5000, 0.1, and 0.1 respectively. For the learning of DART, we use randomly resized crops to augment the single test sample to obtain a batch of B = 64 images, and the conﬁdence threshold ρ follows the same setting in (Shu et al. 2022). An Adam optimizer with a learning rate of 0.003 is used to optimize the prompts P.