On Pitfalls of Test-Time Adaptation
Authors: Hao Zhao, Yuejiang Liu, Alexandre Alahi, Tao Lin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, our benchmark reveals three common pitfalls in prior efforts. |
| Researcher Affiliation | Academia | Hao Zhao 1 * Yuejiang Liu 1 * Alexandre Alahi 1 Tao Lin 2 3 *Equal contribution 1École Polytechnique Fédérale de Lausanne (EPFL) 2Research Center for Industries of the Future, Westlake University 3School of Engineering, Westlake University. |
| Pseudocode | Yes | Algorithm 1 Oracle model selection for online TTA |
| Open Source Code | Yes | Our code is available at https: //github.com/lins-lab/ttab. |
| Open Datasets | Yes | To streamline standardized evaluations of TTA methods, we first equip the benchmark library with shared data loaders for a set of common datasets, including CIFAR10-C (Hendrycks & Dietterich, 2019), CIFAR10.1 (Recht et al., 2018), Image Net-C (Hendrycks & Dietterich, 2019), Office Home (Venkateswara et al., 2017), PACS (Li et al., 2017), Colored MNIST (Arjovsky et al., 2019), and Waterbirds (Sagawa et al., 2019). |
| Dataset Splits | Yes | Training-domain validation data is used to determine the number of supports to store in T3A following Iwasawa & Matsuo (2021). |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We use Res Net-18/Res Net-26/Res Net-50 as the base model on Colored MNIST/CIFAR10-C/large-scale image datasets and always choose SGDm as the optimizer. We choose method-specific hyperparameters following prior work. Following Iwasawa & Matsuo (2021), we assign the pseudo label in SHOT if the predictions are over a threshold which is 0.9 in our experiment and utilize β = 0.3 for all experiments except β = 0.1 for Colored MNIST just as Liang et al. (2020). We set the number of augmentations B = 32 for small-scale images (e.g. CIFAR10-C, CIFAR100-C) and B = 64 for large-scale image sets like Image Net-C, becasue this is the default option in Sun et al. (2020) and Zhang et al.. We simply set N = 0 that controls the trade-off between source and estimated target statistics because it achieves performance comparable to the best performance when using a batch size of 64 according to Schneider et al. (2020). |