Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
Authors: Zhuo Cao, Heming Du, Bingqing Zhang, Xin Yu, Xue Li, Sen Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a high-quality datasets called QVHighlights Multi-Moment Dataset (QV-M2), along with new evaluation metrics tailored for multi-moment retrieval (MMR). ... We retrain and evaluate 6 existing MR methods on QV-M2 and QVHighlights under both SMR and MMR settings. Results show that QV-M2 serves as an effective benchmark for training and evaluating MMR models, while Flash MMR provides a strong baseline. Specifically, on QV-M2, it achieves improvements over prior SOTA method by 3.00% on G-m AP, 2.70% on m AP@3+tgt, and 2.56% on m R@3. |
| Researcher Affiliation | Academia | Zhuo Cao1 , Heming Du1 , Bingqing Zhang1, Xin Yu1, Xue Li1 , Sen Wang1 1 The University of Queensland, Australia EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes the Flash MMR framework and its components using textual descriptions, mathematical equations, and a detailed architecture diagram (Figure 3). However, it does not include a specific section or figure explicitly labeled "Pseudocode" or "Algorithm," nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Code is released at https://github.com/Zhuo-Cao/QV-M2. |
| Open Datasets | Yes | We introduce QV-M2 (QVHighlights [12] Multi Moment Dataset), an enhanced dataset based on QVHighlights. ... QV-M2 explicitly accounts for queries with multiple relevant moments, making it the first fully human-annotated dataset dedicated to MMR benchmarking. ... We include the dataset annotations in the supplementary material to support reproduction of our results. |
| Dataset Splits | No | The paper mentions "QV-M2 training" and "QV-M2 test set" in Table 3, and "QVHighlights validation set" in Table 2 and 4. It also states "SMR verification experiments on QVHighlights are conducted on the validation set due to the unavailability of test set annotations." While these indicate the use of splits, specific details such as exact percentages, sample counts for each split, or detailed splitting methodology are not explicitly provided in the main text. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using Adam W as an optimizer and referring to Slow Fast [4] and CLIP [26] as encoders, but it does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | The post-verification loss terms LPV and Lrepr are weighted at 9 and 7. We use Adam W as the optimizer and set the NMS threshold to 0.7 during inference. |