Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation

Authors: Shohei Enomoto

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across twelve diverse image classification datasets with two different model architectures demonstrate that ACAVP achieves state-of-the-art accuracy among VP methods, surpasses linear probing in average accuracy, and exhibits superior robustness to distribution shifts, all while maintaining minimal computational overhead during inference. Our code is available at https://github.com/s-enmt/ACAVP.
Researcher Affiliation	Industry	Shohei Enomoto NTT Tokyo, Japan EMAIL
Pseudocode	No	The paper describes the proposed method in Section 3, using mathematical equations and textual descriptions, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/s-enmt/ACAVP.
Open Datasets	Yes	We evaluated our method on 12 downstream classification tasks: CIFAR100, CIFAR10 [Krizhevsky et al., 2009], Flowers102 [Nilsback and Zisserman, 2008], Food101 [Bossard et al., 2014], Euro SAT [Helber et al., 2019], SUN397 [Xiao et al., 2010], DTD [Cimpoi et al., 2014], UCF101 [Soomro et al., 2012], SVHN [Netzer et al., 2011], Oxford Pets [Parkhi et al., 2012], GTSRB [Houben et al., 2013], and CLEVR [Johnson et al., 2017].
Dataset Splits	Yes	For dataset splits, we follow the Co Op protocol [Zhou et al., 2022b,a]. For datasets with predefined validation sets, we use them directly. For others, we randomly allocate 10% of the training data for validation and use the remaining 90% for training. We select the checkpoint with the highest validation accuracy for final evaluation. Detailed dataset information is provided in Table 9.
Hardware Specification	Yes	Table 8: Inference time comparison across different methods using H100 GPU with batch size 500.
Software Dependencies	No	The paper mentions using specific functions like 'torch.nn.utils.clip_grad_value_' which implies the use of PyTorch, but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	In our training setup, we used an initial learning rate of 40 with a Cosine Annealing Learning Rate Scheduler to gradually decrease the learning rate throughout training. The total number of epochs was set to 1000 for all experiments. We employed SGD as our optimizer with a momentum of 0.9. For most datasets, we used a batch size of 256, with the exception of DTD dataset where we used a batch size of 64 due to its smaller size.