Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Authors: Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, Yang Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs. Extensive experiments across various models are conducted to demonstrate that FOA-Attack consistently outperforms state-of-the-art methods, achieving remarkable performance even against closed-source MLLMs.
Researcher Affiliation Collaboration 1Nanyang Technological University, Singapore 2 MBZUAI, United Arab Emirates 3Sea AI Lab, Singapore 4 University of Illinois Urbana-Champaign, USA EMAIL; EMAIL; EMAIL; EMAIL;
Pseudocode Yes A A Detailed Description of Our FOA-Attack Following the M-Attack [33], we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack. The detailed description of the proposed FOA-Attack is shown in Algorithm 1. Algorithm 1: FOA-Attack
Open Source Code Yes The code is released at https://github.com/jiaxiaojun QAQ/FOA-Attack.
Open Datasets Yes Following previous works [13, 33], we use 1,000 clean images of size 224 224 3 from the NIPS 2017 Adversarial Attacks and Defenses Competition dataset1. Additionally, we randomly select 1,000 images from the MSCOCO validation set [35] as target images. 1https://nips.cc/Conferences/2017/Competition Track
Dataset Splits No Following previous works [13, 33], we use 1,000 clean images of size 224 224 3 from the NIPS 2017 Adversarial Attacks and Defenses Competition dataset1. Additionally, we randomly select 1,000 images from the MSCOCO validation set [35] as target images. Explanation: The paper specifies the number of images used for attack evaluation (1,000 clean images and 1,000 target images from existing datasets) but does not define explicit training/validation/test splits of a dataset for model development or training, as the work focuses on an attack method rather than training a new core model.
Hardware Specification Yes All experiments are run on an Ubuntu system using an NVIDIA A100 Tensor Core GPU with 80GB of RAM.
Software Dependencies No All experiments are run on an Ubuntu system using an NVIDIA A100 Tensor Core GPU with 80GB of RAM. Explanation: The paper mentions the operating system (Ubuntu) and GPU hardware (NVIDIA A100), but does not provide specific version numbers for key software components such as Python, PyTorch, or CUDA libraries, which are necessary for full reproducibility.
Experiment Setup Yes Implementation Settings. Following [33], we adopt three CLIP variants, which include Vi T-B/16, Vi T-B/32, and Vi T-g-14-laion2B-s12B-b42K, as surrogate models to generate adversarial examples. The perturbation budget ϵ is set to 16/255 under the norm ℓ . The attack step size is set to 1/255. The number of attack iterations is set to 300. ... We set T = 1.0 and η = 0.2 as the default values in our experiments.