Robust Test-Time Adaptation for Zero-Shot Prompt Tuning
Authors: Ding-Chu Zhang, Zhi Zhou, Yu-Feng Li
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on several benchmarks demonstrate that ADAPROMPT alleviates model bias, adapts to data bias and mostly outperforms the state-of-the-art methods at a small time cost. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China |
| Pseudocode | Yes | Algorithm 1: Confidence-aware Buffer Input: sample xt, pseudo label ˆy(xt), confidence c(xt) Parameter: threshold τ 1: if c(xt) > τ then 2: if buffer is not full then 3: Add(xt,ˆy(xt),c(xt)) 4: else 5: M majority class(es) in buffer 6: if ˆy(xt) / M then 7: Randomly select a class and discard one instance (xi,ˆy(xi),c(xi)) with the lowest confidence in that class where ˆy(xi) M 8: Add(xt,ˆy(xt),c(xt)) 9: else 10: c(xj) the minimum confident value in class ˆy(xt) 11: if c(xj) < c(xt) then 12: Discard the instance (xj,ˆy(xj),c(xj)) in buffer 13: Add(xt,ˆy(xt),c(xt)) 14: end if 15: end if 16: end if 17: end if |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on two standard benchmarks: CIFAR10-C and CIFAR100-C (Hendrycks and Dietterich 2019) |
| Dataset Splits | No | Different from the previous methods that require training on the training set, we directly update prompts with unlabeled test data and then predict on them. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU models, or cloud computing instance types were mentioned for running experiments. |
| Software Dependencies | No | The paper mentions models and optimizers (e.g., CLIP, Adam W) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | For ADAPROMPT , we set 64 as our buffer size and three different hand-crafted prompts for ensembling, which are an image of a , a colorful image of a and a noisy picture of a . Moreover, we set the batch size to 64 following previous studies (Boudiaf et al. 2022; Niu et al. 2022). The Adam W optimizer optimizes all the prompts with a learning rate of 0.005. We report mean std accuracy over five runs with random seed setting to 0, 1, 2, 3, 4. |