Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SoTTA: Robust Test-Time Adaptation on Noisy Data Streams
Authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. |
| Researcher Affiliation | Collaboration | Taesik Gong Yewon Kim Taeckyung Lee Sorn Chottananurak Sung-Ju Lee Nokia Bell Labs KAIST EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 High-confidence Uniform-class Sampling (HUS) |
| Open Source Code | Yes | The source code is available at https://github.com/taeckyung/So TTA. |
| Open Datasets | Yes | We used three standard TTA benchmarks: CIFAR10-C, CIFAR100-C, and Image Net-C [9] as our target datasets. CIFAR100 [15] consists of 50,000/10,000 training/test data with 100 classes. Image Net [3] consists of 1,281,167/50,000 training/test data with 1,000 classes. MNIST [26] contains 60,000/10,000 training/test data with 10 classes. |
| Dataset Splits | No | The paper mentions 'training data' and 'test data' but does not explicitly specify a distinct 'validation' split or its proportions/counts for reproducibility. |
| Hardware Specification | Yes | The experiments were performed on NVIDIA GeForce RTX 3090 and NVIDIA TITAN RTX GPUs. |
| Software Dependencies | No | The paper mentions software like ADAM optimizer, Torch Vision, and stochastic gradient descent but does not provide specific version numbers for these or other key software dependencies required for reproducibility. |
| Experiment Setup | Yes | We used a fixed hyperparameter of BN momentum m = 0.2 and updated the BN affine parameters via the Adam optimizer [14] with a fixed learning rate of l = 0.001 and a single adaptation epoch. The confidence threshold C0 is set to 0.99 for CIFAR10-C, 0.66 for CIFAR100-C, and 0.33 for Image Net-C. We set the sharpness threshold ρ = 0.05 as previous works [4, 29]. We set the test batch size of 64 in all methods for a fair comparison. We set the memory size to 64 and adapted the model for every 64 samples for our method and Ro TTA [44]. |