Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions

Authors: Zhuo Cao, Heming Du, Bingqing Zhang, Xin Yu, Xue Li, Sen Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a high-quality datasets called QVHighlights Multi-Moment Dataset (QV-M2), along with new evaluation metrics tailored for multi-moment retrieval (MMR). ... We retrain and evaluate 6 existing MR methods on QV-M2 and QVHighlights under both SMR and MMR settings. Results show that QV-M2 serves as an effective benchmark for training and evaluating MMR models, while Flash MMR provides a strong baseline. Specifically, on QV-M2, it achieves improvements over prior SOTA method by 3.00% on G-m AP, 2.70% on m AP@3+tgt, and 2.56% on m R@3.
Researcher Affiliation	Academia	Zhuo Cao1 , Heming Du1 , Bingqing Zhang1, Xin Yu1, Xue Li1 , Sen Wang1 1 The University of Queensland, Australia EMAIL EMAIL, EMAIL
Pseudocode	No	The paper describes the Flash MMR framework and its components using textual descriptions, mathematical equations, and a detailed architecture diagram (Figure 3). However, it does not include a specific section or figure explicitly labeled "Pseudocode" or "Algorithm," nor does it present structured steps in a code-like format.
Open Source Code	Yes	Code is released at https://github.com/Zhuo-Cao/QV-M2.
Open Datasets	Yes	We introduce QV-M2 (QVHighlights [12] Multi Moment Dataset), an enhanced dataset based on QVHighlights. ... QV-M2 explicitly accounts for queries with multiple relevant moments, making it the first fully human-annotated dataset dedicated to MMR benchmarking. ... We include the dataset annotations in the supplementary material to support reproduction of our results.
Dataset Splits	No	The paper mentions "QV-M2 training" and "QV-M2 test set" in Table 3, and "QVHighlights validation set" in Table 2 and 4. It also states "SMR verification experiments on QVHighlights are conducted on the validation set due to the unavailability of test set annotations." While these indicate the use of splits, specific details such as exact percentages, sample counts for each split, or detailed splitting methodology are not explicitly provided in the main text.
Hardware Specification	Yes	All experiments are conducted on a single RTX 4090 GPU.
Software Dependencies	No	The paper mentions using Adam W as an optimizer and referring to Slow Fast [4] and CLIP [26] as encoders, but it does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python version, PyTorch version).
Experiment Setup	Yes	The post-verification loss terms LPV and Lrepr are weighted at 9 and 7. We use Adam W as the optimizer and set the NMS threshold to 0.7 during inference.