Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Authors: Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, Changqing Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate that Dota significantly mitigates forgetting and achieves state-of-the-art performance compared to existing methods. Extensive experiments on diverse datasets validate the effectiveness of the proposed method, demonstrating a significant improvement.
Researcher Affiliation Academia School of Information and Communication Engineering, Beijing University of Posts and Telecommunications1 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications2 College of Intelligence and Computing, Tianjin University3 School of Computer Science and Technology, Harbin Institute of Technology Shenzhen4 Institute for Infocomm Research (I2R), A*STAR Research Entities (ARES)5 Show Lab, National University of Singapore6
Pseudocode Yes Algorithm 1: The pseudocode of Dota.
Open Source Code Yes Code is available at https://github.com/skylineeeeen/DOTA.
Open Datasets Yes Aircraft [33], Caltech101 [12], Cars [29], DTD [7], Euro SAT [20], Flower102 [36], Food101 [4], Pets [37], SUN397 [49], and UCF101 [43]. ... Image Net [10], Image Net-A [22], Image Net-R [21], Image Net-S [46] and Image Net-V2 [40]... Kather Colon dataset [26], Pan Nuke dataset [15], and the WSSS4LUAD dataset [18]
Dataset Splits Yes Table 15: Datasets details. Dataset ... Validation Size Test Size ... Image Net N/A 50,000 ... Caltech101 1,649 2,465 ...
Hardware Specification Yes All experiments are conducted using a single NVIDIA RTX 4090 GPU and a 12-core Intel Xeon Platinum 8352V CPU.
Software Dependencies No The paper mentions building upon the pre-trained CLIP model but does not specify versions for any ancillary software libraries like PyTorch or Python.
Experiment Setup Yes Test-time adaptation is set for single-image scenarios, using a batch size of 1. ... We adjust σ2 within [0.001, 0.002, 0.004], then search for the best η across [0.2, 0.3, 0.4, 0.5] and ρ across [0.005, 0.01, 0.02, 0.03], with the shrinkage parameter ϵ set to 0.0001. ... Another hyperparameter ω we consistently set to 0.001.