Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Robust Vision-Language Models from Natural Latent Spaces

Authors: Zhangyun Wang, Ni Ding, Aniket Mahanti

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on four benchmarks demonstrate that Co APT achieves an excellent trade-off among natural generalization, adversarial robustness, and task-specific adaptation compared to state-of-the-art methods.
Researcher Affiliation	Academia	Zhangyun Wang School of Computer Science University of Auckland EMAIL Ni Ding School of Computer Science University of Auckland EMAIL Aniket Mahanti School of Computer Science University of Auckland EMAIL
Pseudocode	Yes	Algorithm 1 Natural-Latent-Guided Adversarial Prompt Learning
Open Source Code	Yes	For codes, we list the original paper of baseline methods in the appendix with access to their respective code repositories. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. To support reproducibility, we have included the anonymized source code in the supplementary materials for the review process. If the paper is accepted, we will release the complete codebase to the public.
Open Datasets	Yes	For datasets, we only use open-source datasets that are publicly available. We conduct a comprehensive evaluation of the proposed Co APT method across four benchmark settings on 15 datasets spanning diverse vision tasks. For the evaluation of few-shot learning, base-to-novel class generalization, and zero-shot benchmarks, we adopt 11 image classification datasets, including Euro SAT [65] for satellite imagery, UCF101 [66] for action recognition, DTD [67] for texture classification, SUN397 [68] for scene recognition, Caltech101 [69] and Image Net [70] for general object recognition, and FGVC Aircraft [71], Flowers102 [72], Oxford Pets [73], Food101 [74], and Stanford Cars [75] for fine-grained classification tasks.
Dataset Splits	Yes	Specifically, the models are trained on base classes with a 16-shot setting and jointly evaluated on the base classes and the novel unseen classes. The evaluation for each dataset and the corresponding statistical results are presented in Figure 3 and Table 2, respectively. Under few-shot settings we compare with FAP and baselines from its paper.
Hardware Specification	Yes	Details on compute resources are provided in the appendix. We additionally evaluate Co APT on the CLIP Vi T-B/16 architecture under the base-to-novel benchmark to verify its scalability to higher-resolution architectures in terms of both natural accuracy and adversarial robustness.
Software Dependencies	No	The paper mentions using Python implicitly, as it is common for machine learning research. However, it does not explicitly state specific version numbers for Python, PyTorch, CUDA, or any other critical libraries or solvers used for implementation.
Experiment Setup	Yes	Our method is built upon the Vi T-B/32 architecture of Vanilla CLIP. Each experiment is conducted three times with different random seeds, and the average results are reported. The convergence tolerance threshold in Adaptive-FGP is set to ξ = 1e 3, s = 3, and the maximum number of iterations is 30. The parameters of the regularization factor map γ(v) are set to µbase = 0.1 and µgain = 1.2. We employed 2.5-order Rényi divergence regularization, with Lcoapt coefficients set to κ1 = 8, κ2 = 1, κ3 = 1. Adversarial prompts with a length of 4 and a depth of 9 are applied to both the visual and textual branches. The RAdam optimizer with an initial learning rate of 0.00735 is adopted, and the batch size is set to 64. In contrast to the existing research work, we do not set proprietary hyperparameters for any of the benchmarks and datasets, in order to prove the generality of the proposed Co APT.