Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Multimodal Situational Safety
Authors: Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Anderson Compalas, Dawn Song, Xin Wang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety, which explores how safety considerations vary based on the specific situation in which the user or agent is engaged. We argue that for an MLLM to respond safely whether through language or action it often needs to assess the safety implications of a language query within its corresponding visual context. To evaluate this capability, we develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs. The dataset comprises 1,960 language query-image pairs, half of which the image context is safe, and the other half is unsafe. We also develop an evaluation framework that analyzes key safety aspects, including explicit safety reasoning, visual understanding, and, crucially, situational safety reasoning. Our findings reveal that current MLLMs struggle with this nuanced safety problem in the instruction-following setting and struggle to tackle these situational safety challenges all at once, highlighting a key area for future research. Furthermore, we develop multi-agent pipelines to coordinately solve safety challenges, which shows consistent improvement in safety over the original MLLM response. |
| Researcher Affiliation | Academia | Kaiwen Zhou1 , Chengzhi Liu1 , Xuandong Zhao2, Anderson Compalas1, Dawn Song2, Xin Eric Wang1 1University of California, Santa Cruz 2University of California, Berkeley |
| Pseudocode | No | The paper describes workflows for multi-agent systems using diagrams and textual descriptions of agent roles, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data: mssbench.github.io. |
| Open Datasets | Yes | To comprehensively evaluate the current MLLM s situational safety performance, we introduce a Multimodal Situational Safety benchmark (MSSBench) with 1960 language-image pairs. ... Initially, we randomly select 5,000 images I = {i1, ..., i N} from the COCO dataset (Lin et al., 2014) for each situational safety category, considering them as safe images. ... Code and data: mssbench.github.io. |
| Dataset Splits | Yes | Our dataset is a balance dataset, with half of the data containing safe situations and half containing unsafe situations. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It mentions evaluating different MLLMs and their versions, but not the computational infrastructure. |
| Software Dependencies | No | The paper lists various MLLMs (e.g., LLaVA-1.6, MiniGPT4-v2, Qwen-VL) and mentions using GPT-4o for categorization, but it does not specify version numbers for general programming languages or libraries (e.g., Python, PyTorch, TensorFlow) used in their implementation. |
| Experiment Setup | No | The paper describes evaluation settings like 'instruction following setting', 'query classification', and 'intent classification' with corresponding prompts. It also mentions using 'default settings' for open-source MLLMs. However, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or training configurations for the models or the multi-agent system described. |