Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Persistent Test-time Adaptation in Recurring Testing Scenarios
Authors: Trung Hieu Hoang, MinhDuc Vo, Minh Do
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The supreme stability of Pe TTA over existing approaches, in the face of lifelong TTA scenarios, has been demonstrated over comprehensive experiments on various benchmarks. Our project page is available at https://hthieu166.github.io/petta. 5 Experimental Results |
| Researcher Affiliation | Academia | Trung-Hieu Hoang1 Duc Minh Vo2 Minh N. Do1,3 1Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign 2The University of Tokyo 3Vin Uni-Illinois Smart Health Center, Vin University EMAIL EMAIL |
| Pseudocode | Yes | Appdx. E.1 introduces the pseudo code of Pe TTA. |
| Open Source Code | Yes | Our project page is available at https://hthieu166.github.io/petta. The source code of Pe TTA is also attached as supplemental material. |
| Open Datasets | Yes | Specifically, CIFAR10 CIFAR10-C, CIFAR100 CIFAR100-C, and Image Net Image Net-C [19] are three corrupted images classification tasks... Additionally, we incorporate Domain Net [44]... All the datasets, including CIFAR-10-C, CIFAR100-C and Image Net-C [19] are publicly available online, released under Apache-2.0 license.7 |
| Dataset Splits | Yes | Following the practical TTA setup, multiple testing scenarios from each testing set will gradually change from one to another while the Dirichlet distribution (Dir(0.1) for CIFAR10C, Domain Net, and Image Net-C, and Dir(0.01) for CIFAR100-C) generates category temporally correlated batches of data. For evaluation, an independent set of 2000 samples following the same distribution is used for computing the prediction frequency, and the false negative rate (FNR). |
| Hardware Specification | Yes | A computer cluster equipped with an Intel(R) Core(TM) 3.80GHz i7-10700K CPU, 64 GB RAM, and one NVIDIA Ge Force RTX 3090 GPU (24 GB VRAM) is used for our experiments. |
| Software Dependencies | No | We use Py Torch [43] for implementation. Robust Bench [10] and torchvision [35] provide pre-trained source models. |
| Experiment Setup | Yes | Unless otherwise noted, for all Pe TTA experiments, the EMA update rate for robust batch normalization [61] and feature embedding statistics is set to 5e 2; α0 = 1e 3 and cosine similarity regularizer is used. On CIFAR10/100-C and Image Net-C we use the self-training loss in [12] for LCLS and λ0 = 10 while the regular cross-entropy loss [13] and λ0 = 1 (severe domain shift requires prioritizing adaptability) are applied in Domain Net experiments. |