NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation

Authors: Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, Sung-Ju Lee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation with various datasets, including real-world non-i.i.d. streams, demonstrates that the proposed robust TTA not only outperforms state-of-the-art TTA algorithms in the non-i.i.d. setting, but also achieves comparable performance to those algorithms under the i.i.d. assumption.
Researcher Affiliation Academia Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee KAIST Daejeon, South Korea {taesik.gong,jongheonj,maxkim139,yewon.e.kim,jinwoos,profsj}@kaist.ac.kr
Pseudocode Yes We detail the algorithm of PBRS as a pseudo-code in Algorithm 1. Algorithm 1 Prediction-Balanced Reservoir Sampling
Open Source Code Yes Code is available at https://github.com/Taesik Gong/NOTE.
Open Datasets Yes We use CIFAR10-C, CIFAR100-C, and Image Net-C [13] datasets that are common TTA benchmarks for evaluating the robustness to corruptions [29, 33, 41, 44, 4]. Both CIFAR10/CIFAR100 [19] have 50,000/10,000 training/test data. Image Net [7] has 1,281,167/50,000 training/test data.
Dataset Splits No The paper does not explicitly mention a 'validation' dataset split for hyperparameter tuning or early stopping. It focuses on training and test data.
Hardware Specification Yes 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A.
Software Dependencies No The paper mentions 'Py Torch framework [31]' and 'Adam optimizer [18]' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We set the test batch size as 64 and the adaptation epoch as one for adaptation, which is the most common setting among the baselines [33, 4, 41]. Similarly, we set the memory size N as 64 and adapt the model every 64 samples in NOTE to ensure a fair memory constraint. We conduct online adaptation and evaluation, where the model is continually updated. For the baselines, we adopt the best values for the hyperparameters reported in their papers or the official codes. We followed the guideline to tune the hyperparameters when such a guideline was available [44]. We use fixed values for the hyperparameters of NOTE, soft-shrinkage width = 4 and exponential moving average momentum m = 0.01, and update the affine parameters via the Adam optimizer [18] with a learning rate of l = 0.0001 unless specified.