Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models

Authors: Junhao Dong, Cong Zhang, Xinghua Qu, Zejun MA, Piotr Koniusz, Yew Soon Ong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across various vision-language benchmarks validate the effectiveness of our method in numerous scenarios, demonstrating its plug-and-play applicability to large-scale VLMs.
Researcher Affiliation	Collaboration	1Nanyang Technological University, 2CFAR, IHPC, A*STAR, 3Tik Tok, Singapore, 4Bytedance, 5Data61 CSIRO, 6University of New South Wales, 7Australian National University
Pseudocode	No	The paper describes methods and formulations but does not contain any clearly labeled pseudocode or algorithm blocks. For example, Section 3.3 and 3.4 describe processes in paragraph form and mathematical equations without structured algorithmic steps.
Open Source Code	No	Answer: [No] Justification: Detailed experimental setups for reproducing our results are provided in Appendix B. All the datasets used in the paper are publicly available.
Open Datasets	Yes	Datasets. In line with prior works [55, 63], we conduct adversarial learning on the Image Net training set [13], with zero-shot classification evaluations on its test set and other 13 datasets. We further explore downstream task generalization across various datasets w.r.t. image captioning, visual question answering, object hallucination, and science question answering (see Appendix B.1). [...] Natural Object Recognition: STL-10 [10], CIFAR-10/100 [39], and Caltech-101 [26]. Fine-Grained Recognition: Stanf. Cars (Stanford Cars) [38], Oxford-IIIT Pets [58], Flower102 [57], and FGVC Aircraft [54]. Texture Recognition: DTD (Describable Textures Dataset) [9]. Remote Sensing Classification: Euro SAT [32]. Medical Image Diagnosis: PCAM (Patch CAMelyon) [67]. Robust Classification: Image Net-R(endition) [33] and Image Net-S(ketch) [68]. [...] For image captioning, evaluations are conducted on the COCO [45] and Flickr30k [59] datasets. For VQA, we assess performance on Text VQA [64], VQAv2 [28], and Viz Wiz [31]. Object hallucination is evaluated on the COCO dataset [45], while Chain-of-Thought (Co T) reasoning is assessed on the SCIENCE QA benchmark [52]
Dataset Splits	Yes	Datasets. In line with prior works [55, 63], we conduct adversarial learning on the Image Net training set [13], with zero-shot classification evaluations on its test set and other 13 datasets. [...] Implementation details. [...] we adopt the CLIP model [60] with the Vi T-Large/14 architecture, as in previous studies [55, 63]. [...] For robust weak-to-strong generalization of other CLIP architectures (see Table 3 in the main text), we consider the relatively strong-capacity student model of Res Net101 and Vi T-Base/16 with the corresponding weak-capacity teacher model of Res Net50 and Vi T-Base/32, respectively.
Hardware Specification	No	The paper mentions specific model architectures like CLIP Vi T-Large/14, Res Net101, and Vi T-Base/16, but does not provide specific hardware details such as GPU or CPU models, memory, or computing platforms used for running the experiments.
Software Dependencies	No	For network parameter optimization, we adopt the Adam W optimizer [50] with momentum coefficients (0.9, 0.95).
Experiment Setup	Yes	During adversarial weak-to-strong generalization, we conduct adversary generation via 10-step PGD [53] with perturbation radius ϵ = 2/255 and step size α = 1/255 in an unsupervised scheme (see Eq. (12)). We set the inverse adversarial perturbation radius as ˇϵ = 2/255. For network parameter optimization, we adopt the Adam W optimizer [50] with momentum coefficients (0.9, 0.95). The adversarial weak-to-strong generalization is optimized with a cosine annealing learning rate schedule with a linear warm-up to the maximum learning rate of 1 10 5 for 2 epochs. During the warm-up period, we also integrate the objective function of FARE [63] (See Appendix A) with a tiny weighting factor of λwarm-up = 0.2. For Parameter-Efficient Fine Tuning (PEFT) extension of our Adv-W2S method in Table 5, we adopt the Low-Rank Adaptation (Lo RA) [35] framework, applied specifically to the attention modules.