reproducibilityindex.ai

SoTTA: Robust Test-Time Adaptation on Noisy Data Streams

Authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples.
Researcher Affiliation	Collaboration	Taesik Gong Yewon Kim Taeckyung Lee Sorn Chottananurak Sung-Ju Lee Nokia Bell Labs KAIST taesik.gong@nokia-bell-labs.com {yewon.e.kim,taeckyung,sorn111930,profsj}@kaist.ac.kr
Pseudocode	Yes	Algorithm 1 High-confidence Uniform-class Sampling (HUS)
Open Source Code	Yes	The source code is available at https://github.com/taeckyung/So TTA.
Open Datasets	Yes	We used three standard TTA benchmarks: CIFAR10-C, CIFAR100-C, and Image Net-C [9] as our target datasets. CIFAR100 [15] consists of 50,000/10,000 training/test data with 100 classes. Image Net [3] consists of 1,281,167/50,000 training/test data with 1,000 classes. MNIST [26] contains 60,000/10,000 training/test data with 10 classes.
Dataset Splits	No	The paper mentions 'training data' and 'test data' but does not explicitly specify a distinct 'validation' split or its proportions/counts for reproducibility.
Hardware Specification	Yes	The experiments were performed on NVIDIA GeForce RTX 3090 and NVIDIA TITAN RTX GPUs.
Software Dependencies	No	The paper mentions software like ADAM optimizer, Torch Vision, and stochastic gradient descent but does not provide specific version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup	Yes	We used a fixed hyperparameter of BN momentum m = 0.2 and updated the BN affine parameters via the Adam optimizer [14] with a fixed learning rate of l = 0.001 and a single adaptation epoch. The confidence threshold C0 is set to 0.99 for CIFAR10-C, 0.66 for CIFAR100-C, and 0.33 for Image Net-C. We set the sharpness threshold ρ = 0.05 as previous works [4, 29]. We set the test batch size of 64 in all methods for a fair comparison. We set the memory size to 64 and adapted the model for every 64 samples for our method and Ro TTA [44].