Towards Stable Test-time Adaptation in Dynamic Wild World
Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Mingkui Tan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios. We conduct experiments based on Image Net-C (Hendrycks & Dietterich, 2019), a large-scale and widely used benchmark for out-of-distribution generalization. |
| Researcher Affiliation | Collaboration | Shuaicheng Niu14 , Jiaxiang Wu2 , Yifan Zhang3 , Zhiquan Wen1, Yaofo Chen1, Peilin Zhao2, Mingkui Tan15 sensc@mail.scut.edu.cn; mingkuitan@scut.edu.cn South China University of Technology1 Tencent AI Lab2 National University of Singapore3 Key Laboratory of Big Data and Intelligent Robot, Ministry of Education4 Pazhou Laboratory5 |
| Pseudocode | Yes | B PSEUDO CODE OF SAR In this appendix, we provide the pseudo-code of our SAR method. From Algorithm 1, for each test sample xj, we first apply the reliable sample filtering scheme (refer to lines 3-6) to it to determine whether it will be used to update the model. |
| Open Source Code | Yes | The source code is available at https://github.com/mr-eggplant/SAR. |
| Open Datasets | Yes | We conduct experiments based on Image Net-C (Hendrycks & Dietterich, 2019), a large-scale and widely used benchmark for out-of-distribution generalization. It contains 15 types of 4 main categories (noise, blur, weather, digital) corrupted images and each type has 5 severity levels. |
| Dataset Splits | No | The paper uses standard datasets like Image Net-C for evaluation but does not explicitly state train/validation/test splits with percentages or counts for their specific experimental setup during the test-time adaptation phase. |
| Hardware Specification | Yes | The real run time is tested via a single V100 GPU. |
| Software Dependencies | No | All adopted model weights are public available and obtained from torchvision or timm repository (Wightman, 2019). (Does not specify version numbers for these libraries) |
| Experiment Setup | Yes | For our SAR, we use SGD as the update rule, with a momentum of 0.9, batch size of 64 (except for the experiments of batch size=1), and learning rate of 0.00025/0.001 for Res Net/Vit models. The threshold E0 in Eqn. (2) is set to 0.4 ln 1000 by following EATA (Niu et al., 2022a). ρ in Eqn. (3) is set by the default value 0.05 in Foret et al. (2021). |