Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unleashing the Power of Visual Prompting At the Pixel Level

Authors: Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide extensive experimental results to demonstrate the effectiveness of our method. Using a CLIP model, our prompting method registers a new record of 82.5% average accuracy across 12 popular classification datasets, substantially surpassing the prior art by +5.2%. It is worth noting that such performance not only surpasses linear probing by +2.2%, but, in certain datasets, is on par with the results from fully fine-tuning. Additionally, our prompting method shows competitive performance across different data scales and against distribution shifts.
Researcher Affiliation	Collaboration	Junyang Wu* EMAIL Shanghai Jiao Tong University Xianhang Li* EMAIL UC Santa Cruz Chen Wei EMAIL Johns Hopkins University Huiyu Wang EMAIL FAIR, Meta Alan Yuille EMAIL Johns Hopkins University Yuyin Zhou EMAIL UC Santa Cruz Cihang Xie EMAIL UC Santa Cruz
Pseudocode	No	The paper describes methods in text and uses figures (Figure 2, 5) to illustrate designs, but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. It only mentions that the implementation is based on PyTorch, which is a third-party library.
Open Datasets	Yes	We evaluate visual prompting methods on 12 downstream classification datasets, including CIFAR100, CIFAR10 (Krizhevsky et al., 2009), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), Euro SAT (Helber et al., 2019), SUN397 (Xiao et al., 2010), SVHN (Netzer et al., 2011), DTD (Cimpoi et al., 2014), Oxford Pets (Parkhi et al., 2012), Resisc45 (Cheng et al., 2017), CLEVR (Johnson et al., 2017), and DMLab (Beattie et al., 2016). In addition, we test the robustness of visual prompting on 3 out-of-distribution datasets (Koh et al., 2021) (Camelyon17, FMo W, and i Wild CAM), and 2 corruption datasets (Hendrycks & Dietterich, 2018) (CIFAR100-C and CIFAR10-C).
Dataset Splits	Yes	In this section, we evaluate the performance of EVP with different data scales. We train EVP using only 1%, 4%, 7%, and 10% of the data for each class in the training datasets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions the pre-trained models used.
Software Dependencies	No	The paper states 'Our implementation is based on Pytorch (Paszke et al., 2019)', which refers to the PyTorch library but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	Our implementation is based on Pytorch (Paszke et al., 2019). We use CLIP-B/32, Instagram (Mahajan et al., 2018), and Res Net50 (He et al., 2016) as our pre-trained model, and the batch size is 256, 32, 128, respectively. All visual prompting in our experiments are trained for 1000 epochs. For EVP, we use SGD with a cosine learning rate schedule; the initial learning rate is 70. The prompting size is 30 pixels by default.