Towards Stable Test-time Adaptation in Dynamic Wild World

Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Mingkui Tan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios. We conduct experiments based on Image Net-C (Hendrycks & Dietterich, 2019), a large-scale and widely used benchmark for out-of-distribution generalization.
Researcher Affiliation Collaboration Shuaicheng Niu14 , Jiaxiang Wu2 , Yifan Zhang3 , Zhiquan Wen1, Yaofo Chen1, Peilin Zhao2, Mingkui Tan15 sensc@mail.scut.edu.cn; mingkuitan@scut.edu.cn South China University of Technology1 Tencent AI Lab2 National University of Singapore3 Key Laboratory of Big Data and Intelligent Robot, Ministry of Education4 Pazhou Laboratory5
Pseudocode Yes B PSEUDO CODE OF SAR In this appendix, we provide the pseudo-code of our SAR method. From Algorithm 1, for each test sample xj, we first apply the reliable sample filtering scheme (refer to lines 3-6) to it to determine whether it will be used to update the model.
Open Source Code Yes The source code is available at https://github.com/mr-eggplant/SAR.
Open Datasets Yes We conduct experiments based on Image Net-C (Hendrycks & Dietterich, 2019), a large-scale and widely used benchmark for out-of-distribution generalization. It contains 15 types of 4 main categories (noise, blur, weather, digital) corrupted images and each type has 5 severity levels.
Dataset Splits No The paper uses standard datasets like Image Net-C for evaluation but does not explicitly state train/validation/test splits with percentages or counts for their specific experimental setup during the test-time adaptation phase.
Hardware Specification Yes The real run time is tested via a single V100 GPU.
Software Dependencies No All adopted model weights are public available and obtained from torchvision or timm repository (Wightman, 2019). (Does not specify version numbers for these libraries)
Experiment Setup Yes For our SAR, we use SGD as the update rule, with a momentum of 0.9, batch size of 64 (except for the experiments of batch size=1), and learning rate of 0.00025/0.001 for Res Net/Vit models. The threshold E0 in Eqn. (2) is set to 0.4 ln 1000 by following EATA (Niu et al., 2022a). ρ in Eqn. (3) is set by the default value 0.05 in Foret et al. (2021).