Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Authors: Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Juntao Dai, Yuanpei Chen, Yaodong Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate that policies aligned through our ISA exploration achieve: (I) an effective trade-off between safety and task performance, evidenced by an average 83.58% safety improvement over state-of-the-art method, while maintaining task performance (+3.85%); (II) strong safety assurance, particularly in mitigating long-tail risks and handling extreme failure scenarios, as supported by the elimination of high-risk actions and a drastic reduction in unsafe incident severity; and (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations.
Researcher Affiliation Academia 1 Institute for Artificial Intelligence, Peking University. 2 PKU-Psi Bot Joint Lab. 3 State Key Laboratory of General Artificial Intelligence, Peking University. 4 Zhongguancun Academy. Author email: <EMAIL, EMAIL>.
Pseudocode Yes Algorithm 1 Corner Safety Component Algorithm 2 Blind Spots Safety Component Algorithm 3 Fragile Collection Safety Component Algorithm 4 Critical Points Safety via Perturbation Algorithm 5 Dangerous Equipment Safety Component
Open Source Code Yes Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.
Open Datasets Yes Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io. To the best of our knowledge, this work is the first systematic explorations into explicitly integrating safety constraints into VLAs using principles from Safe RL. Our main contributions are: ... Environment: Addressing the gap in comprehensive VLA safety assessment, we introduce Safety-CHORES. This novel testbed is a direct result of the modeling and eliciting aspects of our ISA.
Dataset Splits Yes For each task, houses from Proc THOR are allocated into training and test sets in a 10:1 ratio, ensuring that testing is conducted on unseen houses. In the Safety-Obj Nav evaluation experiment, the test scene comprised 200 houses with 200 corresponding tasks, while the other two tasks followed similar settings.
Hardware Specification Yes All our experiments are conducted on 8 NVIDIA H100 GPUs, using Pytorch 2.0.1, CUDA 12.2, and are performed on Ubuntu 20.04.2 LTS.
Software Dependencies Yes All our experiments are conducted on 8 NVIDIA H100 GPUs, using Pytorch 2.0.1, CUDA 12.2, and are performed on Ubuntu 20.04.2 LTS.
Experiment Setup Yes The cost threshold bi is empirically set to 20% of the converged cost from the FLa Re baseline. This common Safe RL practice [74, 63] avoids arbitrary absolute values. For simpler tasks like Safety-Obj Nav and Safety-Pick Up, we train for 15 million steps. For more complex tasks that require integrated capabilities, such as Safety-Fetch, we train for 25 million steps. In Table 12, we provide a detailed list of the hyperparameters used during training.