Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DART: Dual-Modal Adaptive Online Prompting and Knowledge Retention for Test-Time Adaptation
Authors: Zichen Liu, Hongbo Sun, Yuxin Peng, Jiahuan Zhou
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various large-scale benchmarks demonstrate the effectiveness of our proposed DART against state-of-the-art methods. |
| Researcher Affiliation | Academia | Wangxuan Institute of Computer Technology, Peking University EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Since the data distribution shifting will inevitably occur in real-world scenarios, the experiments are conducted on three large-scale benchmarks, Image Net-A (Hendrycks et al. 2021b), Image Net-R (Hendrycks et al. 2021a), and Image Net-Sketch (Wang et al. 2019) which are variants of the Image Net (Deng et al. 2009) dataset to evaluate the performance of different methods for improving the test-time generalization ability of CLIP. |
| Dataset Splits | No | The paper describes processing of individual test samples in an online manner and mentions some few-shot methods use '16-shot extra training images,' but it does not specify explicit overall dataset splits (e.g., percentages or counts) for training, validation, and testing that would be needed to reproduce the experimental setup from scratch for the entire dataset. |
| Hardware Specification | Yes | All experiments are implemented on a single NVIDIA 4090 GPU. |
| Software Dependencies | No | The paper mentions software components such as CLIP, Vi T-B/16, and an Adam optimizer, but it does not provide specific version numbers for these or other underlying software libraries/frameworks (e.g., Python, PyTorch, CUDA) that would be necessary for exact reproduction. |
| Experiment Setup | Yes | For each test image, we initialize all the text prompts in our DART as a photo of a . The image prompts are initialized with a uniform distribution of ( 1, 1) following the previous visual prompting methods (Wang et al. 2022e,d). The length of image prompts is set to 2, and they are added to the second layer of the CLIP image encoder. The hyper-parameters h, w T , and w I of dual-modal knowledge retention prompts are set to 5000, 0.1, and 0.1 respectively. For the learning of DART, we use randomly resized crops to augment the single test sample to obtain a batch of B = 64 images, and the con๏ฌdence threshold ฯ follows the same setting in (Shu et al. 2022). An Adam optimizer with a learning rate of 0.003 is used to optimize the prompts P. |