reproducibilityindex.ai

NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation

Authors: Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, Sung-Ju Lee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation with various datasets, including real-world non-i.i.d. streams, demonstrates that the proposed robust TTA not only outperforms state-of-the-art TTA algorithms in the non-i.i.d. setting, but also achieves comparable performance to those algorithms under the i.i.d. assumption.
Researcher Affiliation	Academia	Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee KAIST Daejeon, South Korea {taesik.gong,jongheonj,maxkim139,yewon.e.kim,jinwoos,profsj}@kaist.ac.kr
Pseudocode	Yes	We detail the algorithm of PBRS as a pseudo-code in Algorithm 1. Algorithm 1 Prediction-Balanced Reservoir Sampling
Open Source Code	Yes	Code is available at https://github.com/Taesik Gong/NOTE.
Open Datasets	Yes	We use CIFAR10-C, CIFAR100-C, and Image Net-C [13] datasets that are common TTA benchmarks for evaluating the robustness to corruptions [29, 33, 41, 44, 4]. Both CIFAR10/CIFAR100 [19] have 50,000/10,000 training/test data. Image Net [7] has 1,281,167/50,000 training/test data.
Dataset Splits	No	The paper does not explicitly mention a 'validation' dataset split for hyperparameter tuning or early stopping. It focuses on training and test data.
Hardware Specification	Yes	3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A.
Software Dependencies	No	The paper mentions 'Py Torch framework [31]' and 'Adam optimizer [18]' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We set the test batch size as 64 and the adaptation epoch as one for adaptation, which is the most common setting among the baselines [33, 4, 41]. Similarly, we set the memory size N as 64 and adapt the model every 64 samples in NOTE to ensure a fair memory constraint. We conduct online adaptation and evaluation, where the model is continually updated. For the baselines, we adopt the best values for the hyperparameters reported in their papers or the ofﬁcial codes. We followed the guideline to tune the hyperparameters when such a guideline was available [44]. We use ﬁxed values for the hyperparameters of NOTE, soft-shrinkage width = 4 and exponential moving average momentum m = 0.01, and update the afﬁne parameters via the Adam optimizer [18] with a learning rate of l = 0.0001 unless speciﬁed.