Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mitigating Semantic Collapse in Partially Relevant Video Retrieval

Authors: WonJun Moon, MinSeok Jung, Gilhan Park, Tae-Young Kim, Cheol-Ho Cho, Woojin Jun, Jae-Pil Heo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on PRVR benchmarks demonstrate that our framework effectively prevents semantic collapse and substantially improves retrieval accuracy.
Researcher Affiliation	Academia	Sungkyunkwan University EMAIL
Pseudocode	Yes	Algorithm 1 Order-Preserving Token Merging (OP-To Me) ... Algorithm 2 Pre-computing the different levels of clip number (Eq. 9) ... Algorithm 3 Constructing merged clips for Adaptive CBVA
Open Source Code	Yes	Code will be released (https://github.com/admins97/MSC_PRVR).
Open Datasets	Yes	We evaluated our method on four PRVR datasets: QVHighlights [24], TVR [25], Activity Net Captions [22], and Charades-STA [12].
Dataset Splits	Yes	TVR [25]... The training set contains 17,435 videos and 87,175 queries, while the evaluation set includes 2,179 videos and 10,895 queries. Activity Net Captions [22]... The dataset includes 10,009 videos for training and 4,917 for evaluation. Charades-STA [12]... It consists of 13,898 video-sentence pairs for training and 4,233 for evaluation.
Hardware Specification	Yes	All experiments are conducted on a single RTX A6000 GPU and an Intel Xeon Gold 6338 CPU (2.00GHz) for all datasets.
Software Dependencies	No	For feature extraction, we follow recent works [5, 33, 32]; we extract video features with CLIP-B/32 [37] and Slowfast [10], and use CLIP-B for text embeddings for QVHighlights, and use CLIP-L [37] for encoding both modalities in other datasets. Hyperparameter configurations are adopted from GMMFormer-v2 [46] (e.g., learning rate, batch size, epochs, and optimizer settings) except for the fusing ratio between the frame and clip branches.
Experiment Setup	Yes	Hyperparameter configurations are adopted from GMMFormer-v2 [46] (e.g., learning rate, batch size, epochs, and optimizer settings) except for the fusing ratio between the frame and clip branches. We assign a frame score weight of 0.6 and a clip score weight of 0.4. All loss coefficients are fixed across datasets: λE = 15, λA = 30, and λCBVA = 0.1. To construct consistent clips with OP-To Me, we set N to 75%... Finally, we set the minimum clip count per video to Cmin = 5, and set a similarity threshold τ to 0.7 for QVHighlights, 0.8 for TVR and Activity Net-Captions, and 0.85 for Charades.