Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Vid-SME: Membership Inference Attacks against Large Video Understanding Models

Authors: Qi Li, Runpeng Yu, Xinchao Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various self-trained and open-sourced VULLMs demonstrate the strong effectiveness of Vid-SME. Code is available here. ... We evaluate the performance of Vid-SME on various frame conditions, target datasets and target models. The results consistently demonstrate its strong effectiveness in inferring video membership in VULLMs.
Researcher Affiliation	Academia	Qi Li Runpeng Yu Xinchao Wang National University of Singapore EMAIL EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical formulations, such as the definition of Sharma-Mittal entropy and its adaptive parameterization, and illustrates the overall pipeline in Figure 1. However, it does not include a distinct block or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code is available here.
Open Datasets	Yes	NExT-QA [25] is a video question answering dataset while Cine Pile [38] is a video order reasoning dataset. ... The non-member set consists of all 502 instances from nine scenarios in the MLVU benchmark [66]... we instruction tune the model with video captioning data from Video-XL training set [45]... We use all the 1027 samples from the detailed captioning category in the VDC benchmark [7]... we use the video caption dataset Video-Instruct-100K [32].
Dataset Splits	Yes	For the NExT-QA dataset, we randomly sample 1070, 2140, and 4280 instances from both the training and testing splits to construct the member and non-member sets, respectively. This results in target datasets with three different scales (i.e., 2140, 4280 and 8560). Unless otherwise specified, we default to using 2140 instances for both members and non-members (4280 in total) in all the experiments.
Hardware Specification	Yes	The three self-trained models are trained on 8 A100 GPUs, while all experiments are conducted using 8 NVIDIA RTX A5000 GPUs.
Software Dependencies	No	The paper mentions base models like 'Qwen2-7B-Instruct-224K' and 'clip-vit-large-patch14-336' in Table 5, which are model architectures. However, it does not provide specific software dependencies or library versions (e.g., PyTorch 1.9, CUDA 11.1) needed to replicate the experiments.
Experiment Setup	Yes	The configurations and training trajectories of the three self-trained models are given in Appendix A. ... Table 5: Model configurations of the three self-trained models. Field Video-XL-NExT-QA-7B ... Base LLM Qwen2-7B-Instruct-224K ... Hidden Activation silu ... Vision tower lr 2e-6 ...