Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Vid-SME: Membership Inference Attacks against Large Video Understanding Models

Authors: Qi Li, Runpeng Yu, Xinchao Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various self-trained and open-sourced VULLMs demonstrate the strong effectiveness of Vid-SME. Code is available here. ... We evaluate the performance of Vid-SME on various frame conditions, target datasets and target models. The results consistently demonstrate its strong effectiveness in inferring video membership in VULLMs.
Researcher Affiliation Academia Qi Li Runpeng Yu Xinchao Wang National University of Singapore EMAIL EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical formulations, such as the definition of Sharma-Mittal entropy and its adaptive parameterization, and illustrates the overall pipeline in Figure 1. However, it does not include a distinct block or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code is available here.
Open Datasets Yes NExT-QA [25] is a video question answering dataset while Cine Pile [38] is a video order reasoning dataset. ... The non-member set consists of all 502 instances from nine scenarios in the MLVU benchmark [66]... we instruction tune the model with video captioning data from Video-XL training set [45]... We use all the 1027 samples from the detailed captioning category in the VDC benchmark [7]... we use the video caption dataset Video-Instruct-100K [32].
Dataset Splits Yes For the NExT-QA dataset, we randomly sample 1070, 2140, and 4280 instances from both the training and testing splits to construct the member and non-member sets, respectively. This results in target datasets with three different scales (i.e., 2140, 4280 and 8560). Unless otherwise specified, we default to using 2140 instances for both members and non-members (4280 in total) in all the experiments.
Hardware Specification Yes The three self-trained models are trained on 8 A100 GPUs, while all experiments are conducted using 8 NVIDIA RTX A5000 GPUs.
Software Dependencies No The paper mentions base models like 'Qwen2-7B-Instruct-224K' and 'clip-vit-large-patch14-336' in Table 5, which are model architectures. However, it does not provide specific software dependencies or library versions (e.g., PyTorch 1.9, CUDA 11.1) needed to replicate the experiments.
Experiment Setup Yes The configurations and training trajectories of the three self-trained models are given in Appendix A. ... Table 5: Model configurations of the three self-trained models. Field Video-XL-NExT-QA-7B ... Base LLM Qwen2-7B-Instruct-224K ... Hidden Activation silu ... Vision tower lr 2e-6 ...