Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment
Authors: Youjia Zhang, Youngeun Kim, Young-Geun Choi, Hongyeob Kim, Huiling Liu, Sungeun Hong
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across diverse benchmarks demonstrate that our method achieves state-of-the-art performance under a wide range of distribution shifts with superior scalability and robustness. |
| Researcher Affiliation | Collaboration | Youjia Zhang1 Youngeun Kim2 Young-Geun Choi1 Hongyeob Kim1 Huiling Liu1 Sungeun Hong1 1Sungkyunkwan University 2Amazon |
| Pseudocode | Yes | Algorithm 1 ADAPT: Online TTA... Algorithm 2 ADAPT: Transductive TTA |
| Open Source Code | No | The code will be fully publicly available upon acceptance of the paper. |
| Open Datasets | Yes | Dataset. We evaluated the proposed ADAPT on three different tasks: natural distribution shift, fine-grained categorization, and corruption robustness. Specifically, for natural distribution shift, we use multiple datasets including Image Net [4], Image Net-A [17], Image Net-R [15],Image Net-V [40], and Image Net-Sketch [50]. The corruption robustness task is evaluted on Image Net-C [16], which contains 15 corruption types covering noise, blur, weather, and digital artifacts. We also evaluate performance on 10 fine-grained recognition datasets: Aircraft [32], Caltech101 [8], Cars [22], DTD [3], Euro SAT [14], Flower102 [36], Food101 [1], Pets [37], SUN397 [53] and UCF101 [45]. |
| Dataset Splits | Yes | The datasets we have used are all publicly available, including the data splits. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set the coefficient α to 0.9, and assign the knowledge bank size L as 16 for online and 6 for transductive evaluation. |