WATT: Weight Average Test Time Adaptation of CLIP
Authors: David OSOWIECHI, Mehrdad Noori, Gustavo Vargas Hakim, Moslem Yazdanpanah, Ali Bahri, Milad Cheraghalikhani, Sahar Dastani, Farzad Beizaee, Ismail Ayed, Christian Desrosiers
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings underscore the effectiveness of WATT across diverse datasets, including CIFAR-10-C, CIFAR-10.1, CIFAR-100-C, Vis DA-C, and several other challenging datasets, effectively covering a wide range of domain shifts. Notably, these enhancements are achieved without the need for additional model transformations or trainable modules. Moreover, compared to other TTA methods, our approach can operate effectively with just a single image. |
| Researcher Affiliation | Academia | LIVIA, ÉTS Montréal, Canada International Laboratory on Learning Systems (ILLS) |
| Pseudocode | Yes | In Algorithms 1 and 2, we compare the two variants of WATT: one with Parallel MTWA (WATT-P) and the other with Sequential MTWA (WATT-S). Algorithm 1 WATT-P model f, parameter θ... Algorithm 2 WATT-S model f, parameter θ |
| Open Source Code | Yes | The code is available at: https://github.com/Mehrdad-Noori/WATT. |
| Open Datasets | Yes | Datasets. Following [24], we rigorously evaluate WATT s performance across diverse TTA datasets using established assessment techniques. ...We include CIFAR-10, CIFAR-10.1, and CIFAR-100... We also incorporate the CIFAR-10-C and CIFAR-100-C datasets [28]... Our investigation also extends to the Vis DA-C dataset [29]... Additionally, we evaluate our method on three datasets mostly used in the field of domain generalization: PACS [30], VLCS [31], and Office Home [32] datasets... |
| Dataset Splits | Yes | In our assessment of natural image analysis, we include CIFAR-10, CIFAR-10.1, and CIFAR-100, each comprising 10,000 images and offering varied data distributions. CIFAR-10.1 [28] introduces a natural shift from CIFAR-10, providing a comprehensive evaluation of our model s performance. We also incorporate the CIFAR-10-C and CIFAR-100-C datasets [28], augmented with 15 distinct corruptions across 5 severity levels, resulting in 75 common corruption scenarios. This comprehensive augmentation assesses the model s resilience effectively. ...Our investigation also extends to the Vis DA-C dataset [29], challenging models with simulated and video shifts across diverse imagery types. Additionally, we evaluate our method on three datasets mostly used in the field of domain generalization: PACS [30], VLCS [31], and Office Home [32] datasets, instrumental in understanding texture and style variations. These evaluations effectively demonstrate the generalizability of our method across distinct domain shifts. ...Results on the 3D (simulated shift) and YT (video shift) splits of Vis DA-C demonstrate a significant improvement... |
| Hardware Specification | Yes | All experiments were conducted on an NVIDIA V100 32 GB GPU. ...We conduct a thorough evaluation under consistent conditions using an NVIDIA A6000 GPU within the same Python environment. |
| Software Dependencies | Yes | Our proposed WATT method is implemented in Python using the Py Torch (version 2.0.1) framework. |
| Experiment Setup | Yes | The Adam optimizer is employed with a fixed learning rate of 10-3, wheras a smaller learning rate of 10-4 is chosen for adaptation to the 3D renderings split, as it reflects a more pronounced shift. Throughout our experimentation process, a consistent batch size of 128 is maintained to ensure uniformity and facilitate meaningful comparisons across various scenarios. |