Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment

Authors: Shangkun Sun, Xiaoyu Liang, Songlin Fan, Wenxu Gao, Wei Gao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This suite includes VEBench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with multiple distinct editing prompts, editing results from 8 different models, and the corresponding Mean Opinion Scores (MOS) from 24 human annotators. Based on VEBench DB, we further propose VE-Bench QA, a quantitative human-aligned measurement for the text-driven video editing task. ... Detailed experiments demonstrate that VE-Bench QA achieves state-of-the-art alignment with human preferences, surpassing existing advanced metrics and VQA methods.
Researcher Affiliation	Academia	1Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, SECE, Shenzhen Graduate School, Peking University, 2Peng Cheng Laboratory EMAIL, EMAIL
Pseudocode	No	The paper describes methods and network architecture in text and diagrams (e.g., Figure 6), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/littlespray/VE-Bench
Open Datasets	Yes	Datasets https://openi.pcl.ac.cn/Open Datasets
Dataset Splits	Yes	Following the 10-fold method (Kou et al. 2023; Wu et al. 2023a; Sun et al. 2022), all models are trained with the initial learning rate of 1e 3 and the batch size of 8 on VE-Bench DB for 60 epochs.
Hardware Specification	Yes	We build all models via Py Torch and train them via NVIDIA V100 GPUs.
Software Dependencies	No	We build all models via Py Torch and train them via NVIDIA V100 GPUs. The paper mentions PyTorch but does not specify a version number.
Experiment Setup	Yes	all models are trained with the initial learning rate of 1e 3 and the batch size of 8 on VE-Bench DB for 60 epochs. Following DOVER (Wu et al. 2023a), we first fine-tuning the head for 40 epochs with linear probing, and then train all parameters for another 20 epochs. Adam (Kingma and Ba 2014) optimizer and a cosine scheduler are applied during training.