Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation
Authors: Shohei Enomoto
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across twelve diverse image classification datasets with two different model architectures demonstrate that ACAVP achieves state-of-the-art accuracy among VP methods, surpasses linear probing in average accuracy, and exhibits superior robustness to distribution shifts, all while maintaining minimal computational overhead during inference. Our code is available at https://github.com/s-enmt/ACAVP. |
| Researcher Affiliation | Industry | Shohei Enomoto NTT Tokyo, Japan EMAIL |
| Pseudocode | No | The paper describes the proposed method in Section 3, using mathematical equations and textual descriptions, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/s-enmt/ACAVP. |
| Open Datasets | Yes | We evaluated our method on 12 downstream classification tasks: CIFAR100, CIFAR10 [Krizhevsky et al., 2009], Flowers102 [Nilsback and Zisserman, 2008], Food101 [Bossard et al., 2014], Euro SAT [Helber et al., 2019], SUN397 [Xiao et al., 2010], DTD [Cimpoi et al., 2014], UCF101 [Soomro et al., 2012], SVHN [Netzer et al., 2011], Oxford Pets [Parkhi et al., 2012], GTSRB [Houben et al., 2013], and CLEVR [Johnson et al., 2017]. |
| Dataset Splits | Yes | For dataset splits, we follow the Co Op protocol [Zhou et al., 2022b,a]. For datasets with predefined validation sets, we use them directly. For others, we randomly allocate 10% of the training data for validation and use the remaining 90% for training. We select the checkpoint with the highest validation accuracy for final evaluation. Detailed dataset information is provided in Table 9. |
| Hardware Specification | Yes | Table 8: Inference time comparison across different methods using H100 GPU with batch size 500. |
| Software Dependencies | No | The paper mentions using specific functions like 'torch.nn.utils.clip_grad_value_' which implies the use of PyTorch, but it does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | In our training setup, we used an initial learning rate of 40 with a Cosine Annealing Learning Rate Scheduler to gradually decrease the learning rate throughout training. The total number of epochs was set to 1000 for all experiments. We employed SGD as our optimizer with a momentum of 0.9. For most datasets, we used a batch size of 256, with the exception of DTD dataset where we used a batch size of 64 due to its smaller size. |