SoTTA: Robust Test-Time Adaptation on Noisy Data Streams

Authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples.
Researcher Affiliation Collaboration Taesik Gong Yewon Kim Taeckyung Lee Sorn Chottananurak Sung-Ju Lee Nokia Bell Labs KAIST taesik.gong@nokia-bell-labs.com {yewon.e.kim,taeckyung,sorn111930,profsj}@kaist.ac.kr
Pseudocode Yes Algorithm 1 High-confidence Uniform-class Sampling (HUS)
Open Source Code Yes The source code is available at https://github.com/taeckyung/So TTA.
Open Datasets Yes We used three standard TTA benchmarks: CIFAR10-C, CIFAR100-C, and Image Net-C [9] as our target datasets. CIFAR100 [15] consists of 50,000/10,000 training/test data with 100 classes. Image Net [3] consists of 1,281,167/50,000 training/test data with 1,000 classes. MNIST [26] contains 60,000/10,000 training/test data with 10 classes.
Dataset Splits No The paper mentions 'training data' and 'test data' but does not explicitly specify a distinct 'validation' split or its proportions/counts for reproducibility.
Hardware Specification Yes The experiments were performed on NVIDIA GeForce RTX 3090 and NVIDIA TITAN RTX GPUs.
Software Dependencies No The paper mentions software like ADAM optimizer, Torch Vision, and stochastic gradient descent but does not provide specific version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup Yes We used a fixed hyperparameter of BN momentum m = 0.2 and updated the BN affine parameters via the Adam optimizer [14] with a fixed learning rate of l = 0.001 and a single adaptation epoch. The confidence threshold C0 is set to 0.99 for CIFAR10-C, 0.66 for CIFAR100-C, and 0.33 for Image Net-C. We set the sharpness threshold ρ = 0.05 as previous works [4, 29]. We set the test batch size of 64 in all methods for a fair comparison. We set the memory size to 64 and adapted the model for every 64 samples for our method and Ro TTA [44].