Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping

Authors: Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, Shu-Tao Xia

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted over two benchmark demonstrate the superior performance of Boost Adapter under test-time adaptation settings.
Researcher Affiliation Academia 1 Tsinghua University 2 Shenzhen University 3 Harbin Institute of Technology 4 Peng Cheng Laboratory
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/taolinzhang/Boost Adapter
Open Datasets Yes The OOD benchmark evaluates the model s robustness to natural distribution shifts on 4 Image Net [4] Variants, including Image Net V2 [37], Image Net-Sketch [44], Image Net-A [14] and Image Net-R [13]. We evaluate the transferring performance on 11 datasets in the Cross-Domain benchmark: Aircraft [31], Caltech101 [5], Cars [19], DTD [3], Euro SAT [11], Flower102 [32], Food101 [2], Pets [34], SUN397 [48],and UCF101 [42].
Dataset Splits Yes We follow the split in [55] and report the top-1 accuracy.
Hardware Specification Yes All our experiments are conducted with a Nvidia 3090 24GB GPU.
Software Dependencies No The paper mentions using 'a pre-trained Vi T-B/16 of CLIP as the foundation model' but does not specify software dependencies like programming language versions or library versions (e.g., PyTorch, TensorFlow) with their specific versions.
Experiment Setup Yes In test-time adaptation, the batch size is set to be 1. ... we empirically set the entropy threshold percentile to p = 0.1 and filter 64 augmented views based on random cropping to obtain the boosting samples.