Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Steer: Input-dependent Steering for Multimodal LLMs
Authors: Jayneel Parekh, Pegah KHAYATAN, Mustafa Shukor, Arnaud Dapogny, Alasdair Newson, Matthieu Cord
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first discuss generic experimental setup considerations 4.1 to ensure reproducibility of the results. Then we present results for application of L2S for safety enforcement in MLLMs (Section 4.2) as well as hallucination mitigation (Section 4.3). Quantitative results We report the safety steering results in Table 1. |
| Researcher Affiliation | Collaboration | 1ISIR, Sorbonne Université, Paris, France 2Valeo.ai, Paris, France |
| Pseudocode | No | No explicit pseudocode or algorithm block is present in the paper. The methodology is described in Section 3. |
| Open Source Code | Yes | Our code is publicly available.1, 2 1Github page: https://github.com/jayneelparekh/learn-to-steer |
| Open Datasets | Yes | The MMSafety Bench [36] database provides multimodal queries (image and text) to assess the security of MLLMs. For hallucination mitigation, we benchmark on the POPE dataset [28]. We further evaluate L2S on 500 randomly sampled images from the COCO validation set [29] |
| Dataset Splits | Yes | We use a random split of 80% of data for training/learning the steering vectors and 20% for testing. L2S is trained and tested on balanced subsets containing 70%, 10% and 20% of data for training, validation and test, respectively. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX5000 (24GB) GPU. |
| Software Dependencies | No | The paper mentions using specific models (LLaVA-v1.5-7B, Qwen2-VL-7B, Llama-Guard-3-8B, Gemini-2.0-Flash) but does not explicitly provide version numbers for underlying software libraries or programming languages like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The auxiliary network gΘ for L2S consists in a single 2-layers MLP with hidden size 100, and is trained for 100 epochs using the Adam optimizer with either a learning rate of 10-4 or 5x10-5 as well as a batch size of 64. We use a cosine learning rate scheduler with warmup, followed by an adaptive scheduler that reduces the learning rate when the validation performance plateaus. |