Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing

Authors: Weihan Xu, Yimeng Ma, Jingyue Huang, Yang Li, Wenye Ma, Taylor Berg-Kirkpatrick, Julian McAuley, Paul Liang, Hao-Wen Dong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to evaluate the effectiveness of our proposed models through objective evaluation metrics and a subjective survey. Our objective evaluations show that the proposed method can effectively insert short video clips while maintaining a coherent narrative. In a subjective survey, we show that our proposed method outperforms existing abstractive and extractive approaches in terms of coherence, alignment, and realism in documentary teaser generation.
Researcher Affiliation	Academia	1Duke University 2University of California San Diego 3MBZUAI 4Massachusetts Institute of Technology 5University of Michigan EMAIL
Pseudocode	No	No explicit pseudocode or algorithm block is present. The methodology is described in natural language and mathematical formulas, primarily in Sections 3.1 and 3.2.
Open Source Code	Yes	Video samples and all source code can be found on our website.1
Open Datasets	Yes	In this work, we use the Documentary Net [11] dataset in our experiments. Documentary Net contains 1,269 documentaries paired with their teasers from three reliable sources: DW Documentary, Public Broadcasting Service (PBS) and National Geographic.
Dataset Splits	Yes	We reserve 5 % of the dataset (49 documentaries) for final testing and allocate 10 % of the remaining samples for validation.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A100 GPU with a batch size of 64.
Software Dependencies	No	No explicit software dependencies with specific version numbers (e.g., PyTorch 1.9, Python 3.8) are provided. The paper mentions using specific models like LLa MA [30], BART [32], GPT-4o [29], and Whisper X [16], and the Adam optimizer, but not their underlying software framework versions.
Experiment Setup	Yes	All experiments are conducted on a single NVIDIA A100 GPU with a batch size of 64. We reserve 5 % of the dataset (49 documentaries) for final testing and allocate 10 % of the remaining samples for validation. The learning rate is set to 1 10 5, and training terminates when either the generation or the retrieval validation loss does not decrease within 30 consecutive epochs. We use Adam optimizer for training.