SoTTA: Robust Test-Time Adaptation on Noisy Data Streams
Authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. |
| Researcher Affiliation | Collaboration | Taesik Gong Yewon Kim Taeckyung Lee Sorn Chottananurak Sung-Ju Lee Nokia Bell Labs KAIST taesik.gong@nokia-bell-labs.com {yewon.e.kim,taeckyung,sorn111930,profsj}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 High-confidence Uniform-class Sampling (HUS) |
| Open Source Code | Yes | The source code is available at https://github.com/taeckyung/So TTA. |
| Open Datasets | Yes | We used three standard TTA benchmarks: CIFAR10-C, CIFAR100-C, and Image Net-C [9] as our target datasets. CIFAR100 [15] consists of 50,000/10,000 training/test data with 100 classes. Image Net [3] consists of 1,281,167/50,000 training/test data with 1,000 classes. MNIST [26] contains 60,000/10,000 training/test data with 10 classes. |
| Dataset Splits | No | The paper mentions 'training data' and 'test data' but does not explicitly specify a distinct 'validation' split or its proportions/counts for reproducibility. |
| Hardware Specification | Yes | The experiments were performed on NVIDIA GeForce RTX 3090 and NVIDIA TITAN RTX GPUs. |
| Software Dependencies | No | The paper mentions software like ADAM optimizer, Torch Vision, and stochastic gradient descent but does not provide specific version numbers for these or other key software dependencies required for reproducibility. |
| Experiment Setup | Yes | We used a fixed hyperparameter of BN momentum m = 0.2 and updated the BN affine parameters via the Adam optimizer [14] with a fixed learning rate of l = 0.001 and a single adaptation epoch. The confidence threshold C0 is set to 0.99 for CIFAR10-C, 0.66 for CIFAR100-C, and 0.33 for Image Net-C. We set the sharpness threshold ρ = 0.05 as previous works [4, 29]. We set the test batch size of 64 in all methods for a fair comparison. We set the memory size to 64 and adapted the model for every 64 samples for our method and Ro TTA [44]. |