Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Authors: Xueyang Zhou, Guiyao Tie, Guowen Zhang, Hecheng Wang, Pan Zhou, Lichao Sun

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on multiple VLA benchmarks demonstrate that Bad VLA consistently achieves near-100% attack success rates with minimal impact on clean task accuracy. Further analyses confirm its robustness against common input perturbations, task transfers, and model fine-tuning, underscoring critical security vulnerabilities in current VLA deployments.
Researcher Affiliation Academia Xueyang Zhou1, Guiyao Tie1, Guowen Zhang1, Hechang Wang1, Pan Zhou1 , Lichao Sun2 1Huazhong University of Science and Technology, 2Lehigh University EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Objective-Decoupled Optimization for Backdoor Injection Require: Pretrained model fθ; reference model fref; trigger transformation T; trigger dataset Dtrigger = {(vi, li)}; clean dataset Dclean = {(vi, li, ai)}; trade-off hyperparameter α; learning rate ϵ; training epochs N1, N2 Ensure: Backdoor-injected model f θ 1: // Stage I: Trigger Injection via Reference-Aligned Optimization 2: Freeze θb, θa; initialize θp θref p 3: for t = 1 to N1 do 4: for each (vi, li) Dtrigger do 5: Generate triggered input v i T(vi, δ) 6: Compute clean feature hi = fp(vi, li), triggered feature htrigger i = fp(v i, li) 7: Reference feature href i = f ref p (vi, li) 8: Compute trigger loss Ltrig based on alignment and separation 9: Update θp θp ϵ θp Ltrig 10: // Stage II: Clean Task Fine-tuning with Frozen Perception 11: Freeze θp; unfreeze θb, θa 12: for t = 1 to N2 do 13: for each (vi, li, ai) Dclean do 14: Predict action sequence: ˆai fθ(vi, li) 15: Compute clean-task loss Lclean = ℓ(ˆai, ai) 16: Update θb,a θb,a ϵ θb,a Lclean 17: return Final backdoor model f θ
Open Source Code Yes Our code is available at: https://github.com/Zxy-MLlab/Bad VLA.
Open Datasets Yes In the experiment, we selected four variants of the Open VLA model [9] and Spatial VLA [18] which are currently the most influential open-source VLA models available, as the research subjects. Each variant was independently trained on different task suites from the LIBERO dataset [20], which are Spatial, Object, Goal, and Long (Details refer to Appendix A).
Dataset Splits No The paper mentions evaluating on different task suites from the LIBERO dataset and Simpler Env, and using 'training samples' and 'clean data', but it does not specify explicit percentages or counts for train/test/validation splits for its experiments. It refers to a benchmark but doesn't provide the split methodology used.
Hardware Specification Yes All experiments are conducted on a distributed setup with 8 NVIDIA A800 GPUs.
Software Dependencies No The paper mentions using 'Open VLA model [9]', 'Spatial VLA [18]', and 'LLa MA2-based language model [29]', which are specific models or frameworks. However, it does not specify versions for general software dependencies like Python, PyTorch, or CUDA, which are necessary for full reproducibility.
Experiment Setup Yes For Open VLA variants, we adopt the proposed two-stage objective-decoupled training paradigm. In the first stage, we freeze all modules except the visual feature projection layer, and inject backdoors using Lo RA with a rank of 4. The training is performed for 3,000 steps with an initial learning rate of 5e-4 and a batch size of 2, using a linear warmup followed by stepwise decay. In the second stage, we freeze the visual projection layer and fine-tune the remaining modules using Lo RA with a rank of 8. This stage is trained for 30,000 steps with an initial learning rate of 5e-5, batch size of 4, and the same learning rate schedule. For the Spatial VLA model, we also follow a two-stage training process. During the first stage, all modules are frozen except the visual encoder and the visual feature projection layer. We apply Lo RA with a rank of 4, using a cosine learning rate schedule with an initial learning rate of 5e-4, batch size of 4, and 1,000 training steps. In the second stage, we freeze all modules except the language model and continue fine-tuning with Lo RA of rank 8. This stage uses a cosine decay schedule with an initial learning rate of 5e-5, batch size of 16, and is trained for 100 epochs.