reproducibilityindex.ai

WATT: Weight Average Test Time Adaptation of CLIP

Authors: David OSOWIECHI, Mehrdad Noori, Gustavo Vargas Hakim, Moslem Yazdanpanah, Ali Bahri, Milad Cheraghalikhani, Sahar Dastani, Farzad Beizaee, Ismail Ayed, Christian Desrosiers

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings underscore the effectiveness of WATT across diverse datasets, including CIFAR-10-C, CIFAR-10.1, CIFAR-100-C, Vis DA-C, and several other challenging datasets, effectively covering a wide range of domain shifts. Notably, these enhancements are achieved without the need for additional model transformations or trainable modules. Moreover, compared to other TTA methods, our approach can operate effectively with just a single image.
Researcher Affiliation	Academia	LIVIA, ÉTS Montréal, Canada International Laboratory on Learning Systems (ILLS)
Pseudocode	Yes	In Algorithms 1 and 2, we compare the two variants of WATT: one with Parallel MTWA (WATT-P) and the other with Sequential MTWA (WATT-S). Algorithm 1 WATT-P model f, parameter θ... Algorithm 2 WATT-S model f, parameter θ
Open Source Code	Yes	The code is available at: https://github.com/Mehrdad-Noori/WATT.
Open Datasets	Yes	Datasets. Following [24], we rigorously evaluate WATT s performance across diverse TTA datasets using established assessment techniques. ...We include CIFAR-10, CIFAR-10.1, and CIFAR-100... We also incorporate the CIFAR-10-C and CIFAR-100-C datasets [28]... Our investigation also extends to the Vis DA-C dataset [29]... Additionally, we evaluate our method on three datasets mostly used in the field of domain generalization: PACS [30], VLCS [31], and Office Home [32] datasets...
Dataset Splits	Yes	In our assessment of natural image analysis, we include CIFAR-10, CIFAR-10.1, and CIFAR-100, each comprising 10,000 images and offering varied data distributions. CIFAR-10.1 [28] introduces a natural shift from CIFAR-10, providing a comprehensive evaluation of our model s performance. We also incorporate the CIFAR-10-C and CIFAR-100-C datasets [28], augmented with 15 distinct corruptions across 5 severity levels, resulting in 75 common corruption scenarios. This comprehensive augmentation assesses the model s resilience effectively. ...Our investigation also extends to the Vis DA-C dataset [29], challenging models with simulated and video shifts across diverse imagery types. Additionally, we evaluate our method on three datasets mostly used in the field of domain generalization: PACS [30], VLCS [31], and Office Home [32] datasets, instrumental in understanding texture and style variations. These evaluations effectively demonstrate the generalizability of our method across distinct domain shifts. ...Results on the 3D (simulated shift) and YT (video shift) splits of Vis DA-C demonstrate a significant improvement...
Hardware Specification	Yes	All experiments were conducted on an NVIDIA V100 32 GB GPU. ...We conduct a thorough evaluation under consistent conditions using an NVIDIA A6000 GPU within the same Python environment.
Software Dependencies	Yes	Our proposed WATT method is implemented in Python using the Py Torch (version 2.0.1) framework.
Experiment Setup	Yes	The Adam optimizer is employed with a fixed learning rate of 10-3, wheras a smaller learning rate of 10-4 is chosen for adaptation to the 3D renderings split, as it reflects a more pronounced shift. Throughout our experimentation process, a consistent batch size of 128 is maintained to ensure uniformity and facilitate meaningful comparisons across various scenarios.