Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Training-Free Test-Time Adaptation via Shape and Style Guidance for Vision-Language Models
Authors: Shenglong Zhou, Manjiang Yin, Leiyu Sun, Shicai Yang, Di Xie, Jiang Zhu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on out-of-distribution and cross-domain benchmark datasets demonstrate that our proposed SSG consistently outperforms previous state-of-the-art methods while also exhibiting promising computational efficiency. |
| Researcher Affiliation | Collaboration | 1Hikvision Research Institute 2University of Science and Technology of China |
| Pseudocode | No | The paper describes the method and its components (PPD, PPDsh, PPDst) in text and uses figures to illustrate concepts, but does not include a formal pseudocode or algorithm block. |
| Open Source Code | No | We use the public dataset as mentioned in Section 4, and we will release the codes after the final decision. |
| Open Datasets | Yes | Out-of-distribution benchmark aims to evaluate the model s robustness to natural distribution shifts on 4 Image Net [33] variants, including Image Net-A [34], Image Net-V2 [35], Image Net-R [36], and Image Net-Sketch [37]. The cross-domain benchmark aims to evaluate the transferring performance on 10 diverse recognition datasets, including FGVCAircraft [38], Caltech101 [39], Standford Cars [40], DTD [41], Euro SAT [42], Flowers102 [43], Food101 [44], Oxford Pets [45], SUN397 [46], and UCF101 [47]. |
| Dataset Splits | Yes | We follow the split in Co Op [16], and more details are shown in Appendix. |
| Hardware Specification | Yes | All experiments are conducted on a single 24GB NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper does not provide specific software version numbers. It mentions the general use of CLIP backbones, but no explicit versions for libraries like PyTorch, TensorFlow, or Python. |
| Experiment Setup | Yes | Following TPT [7] and TDA [9], we set the batch size as 1 and generate 63 augmented views for each test image, while setting the k as the top-10%. All experiments are conducted on a single 24GB NVIDIA RTX 4090 GPU. ...the utilised cache in SSG is a dynamic key-value cache, whose memory size is 3 for all datasets. |