Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhancing CLIP Robustness via Cross-Modality Alignment

Authors: Xingyu Zhu, Beier Zhu, Shuo Wang, Kesen Zhao, Hanwang Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present the experimental results of our method under adversarial perturbations, including performance comparisons, ablation studies, and visualization analyses. (Section 4) and tables like Table 1: Classification accuracy (%) on 9 widely-used datasets.
Researcher Affiliation	Academia	1University of Science and Technology of China 2Nanyang Technological University EMAIL, EMAIL
Pseudocode	Yes	The overall procedure of COLA is summarized in Algorithm 1, which outlines the projection-based alignment and OT-based matching steps for adversarially robust inference. (Section C Algorithm)
Open Source Code	Yes	Answer: [Yes] Justification: We have uploaded the codes in supplemental material.
Open Datasets	Yes	We evaluate our method on 14 classification datasets spanning a broad range of domains, including generic objects (Image Net [14], Caltech101 [20]), scenes (SUN397 [58]), textures (DTD [10]), satellite imagery (Euro SAT [23]), and various fine-grained categories such as pets, cars, flowers, food, and aircraft (Pets [39], Cars [26], Flowers [38], Food101 [6], Aircraft [34]).
Dataset Splits	Yes	We evaluate our method on 14 classification datasets spanning a broad range of domains, including generic objects (Image Net [14], Caltech101 [20]), scenes (SUN397 [58]), textures (DTD [10]), satellite imagery (Euro SAT [23]), and various fine-grained categories such as pets, cars, flowers, food, and aircraft (Pets [39], Cars [26], Flowers [38], Food101 [6], Aircraft [34]).
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA 3090 GPU if not specified.
Software Dependencies	No	Our experiments are based on the pre-trained CLIP model, using Vi T-B/32 as the visual encoder and a Transformer as the text encoder. No specific software library versions (e.g., PyTorch, TensorFlow, CUDA) are mentioned.
Experiment Setup	Yes	The attack budgets, including PDG attack and CW acctack [36, 7], are set of ϵa = 1/255 in default. The number of steps for attacks is set as 10. All attacks are bounded by a L radius. For each test image, we generate N = 5 augmented views including the original. For each class, we use the LLM to generate M = 50 text descriptions. We select the top-C = 256 components from the SVD of class text features to build the projection matrix.