Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains
Authors: Guillaume Vray, Devavrat Tomar, Xufeng Gao, Jean-Philippe Thiran, Evan Shelhamer, Behzad Bozorgtabar
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on scenelevel corruption benchmarks (Image Net-C, CIFAR-10/100-C), object-level style shifts (Domain Net-126, PACS), and semantic segmentation (Cityscapes ACDC) covering recurring and continuously evolving domain shifts show that Reservoir TTA substantially improves adaptation accuracy and maintains stable performance across prolonged, recurring shifts, outperforming state-of-the-art methods. |
| Researcher Affiliation | Academia | 1EPFL 2CHUV 3UBC 4Vector Institute 5Aarhus University 1,2{firstname.lastname}@epfl.ch 3,EMAIL |
| Pseudocode | Yes | Model Prediction Predictions are then obtained via the ensemble s parameters from all domain-specific models (see pseudocode in Appendix C). |
| Open Source Code | Yes | Our code is publicly available at https://github.com/LTS5/Reservoir TTA. |
| Open Datasets | Yes | Extensive experiments on scenelevel corruption benchmarks (Image Net-C, CIFAR-10/100-C), object-level style shifts (Domain Net-126, PACS), and semantic segmentation (Cityscapes ACDC)... Note CIFAR10-C, CIFAR100-C, and Image Net-C are publicly available online5 (Apache-2.0 license). CCC is also provided by Rdumb paper6 [31] (MIT license). Both Domain Net-1267 and PACS8 are publicly available. |
| Dataset Splits | Yes | Classification is tested under CCC [31], CSC, and CDC settings over 20 rounds (averaging error rates, %; a subset is shown for clarity). For segmentation, we follow the Cityscapes ACDC protocol [42], where ACDC presents four weather conditions (Fog, Night, Rain, Snow) sequentially. We report the mean Io U (%) averaged over 10 repetitions. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA A100 Tensor Core GPU (80 GB VRAM) on our internal cluster. |
| Software Dependencies | No | All methods are re-implemented in Py Torch [29] within a unified TTA repository [24] for fair comparison, using pre-trained source models from Robust Bench [11]. |
| Experiment Setup | Yes | For CIFAR-10-C and CIFAR-100-C, TTA baselines (except SAR [28]) are optimized with the Adam optimizer [17] using a learning rate of 1 10 3, a universal ̒ = (0.9, 0.999), and a batch size of 200, whereas SAR employs SGD [32]. For Image Net-C, models are adapted with SGD at a batch size of 64 and a learning rate of 2.5 10 4 (adjusted to 1 10 4 for Vi T-B-16 in the CCC setting). For Reservoir TTA, we configure the system with a maximum of Kmax = 16 reservoirs, determine the threshold ̘ using 2000 source examples, and update centroids with Adam W [23] at a learning rate of 1 10 4. Table 10 summarizes these settings. |