Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Authors: Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, Changqing Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that Dota significantly mitigates forgetting and achieves state-of-the-art performance compared to existing methods. Extensive experiments on diverse datasets validate the effectiveness of the proposed method, demonstrating a significant improvement.
Researcher Affiliation	Academia	School of Information and Communication Engineering, Beijing University of Posts and Telecommunications1 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications2 College of Intelligence and Computing, Tianjin University3 School of Computer Science and Technology, Harbin Institute of Technology Shenzhen4 Institute for Infocomm Research (I2R), A*STAR Research Entities (ARES)5 Show Lab, National University of Singapore6
Pseudocode	Yes	Algorithm 1: The pseudocode of Dota.
Open Source Code	Yes	Code is available at https://github.com/skylineeeeen/DOTA.
Open Datasets	Yes	Aircraft [33], Caltech101 [12], Cars [29], DTD [7], Euro SAT [20], Flower102 [36], Food101 [4], Pets [37], SUN397 [49], and UCF101 [43]. ... Image Net [10], Image Net-A [22], Image Net-R [21], Image Net-S [46] and Image Net-V2 [40]... Kather Colon dataset [26], Pan Nuke dataset [15], and the WSSS4LUAD dataset [18]
Dataset Splits	Yes	Table 15: Datasets details. Dataset ... Validation Size Test Size ... Image Net N/A 50,000 ... Caltech101 1,649 2,465 ...
Hardware Specification	Yes	All experiments are conducted using a single NVIDIA RTX 4090 GPU and a 12-core Intel Xeon Platinum 8352V CPU.
Software Dependencies	No	The paper mentions building upon the pre-trained CLIP model but does not specify versions for any ancillary software libraries like PyTorch or Python.
Experiment Setup	Yes	Test-time adaptation is set for single-image scenarios, using a batch size of 1. ... We adjust σ2 within [0.001, 0.002, 0.004], then search for the best η across [0.2, 0.3, 0.4, 0.5] and ρ across [0.005, 0.01, 0.02, 0.03], with the shrinkage parameter ϵ set to 0.0001. ... Another hyperparameter ω we consistently set to 0.001.