Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
Authors: Shangkun Sun, Xiaoyu Liang, Songlin Fan, Wenxu Gao, Wei Gao
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This suite includes VEBench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with multiple distinct editing prompts, editing results from 8 different models, and the corresponding Mean Opinion Scores (MOS) from 24 human annotators. Based on VEBench DB, we further propose VE-Bench QA, a quantitative human-aligned measurement for the text-driven video editing task. ... Detailed experiments demonstrate that VE-Bench QA achieves state-of-the-art alignment with human preferences, surpassing existing advanced metrics and VQA methods. |
| Researcher Affiliation | Academia | 1Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, SECE, Shenzhen Graduate School, Peking University, 2Peng Cheng Laboratory EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and network architecture in text and diagrams (e.g., Figure 6), but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/littlespray/VE-Bench |
| Open Datasets | Yes | Datasets https://openi.pcl.ac.cn/Open Datasets |
| Dataset Splits | Yes | Following the 10-fold method (Kou et al. 2023; Wu et al. 2023a; Sun et al. 2022), all models are trained with the initial learning rate of 1e 3 and the batch size of 8 on VE-Bench DB for 60 epochs. |
| Hardware Specification | Yes | We build all models via Py Torch and train them via NVIDIA V100 GPUs. |
| Software Dependencies | No | We build all models via Py Torch and train them via NVIDIA V100 GPUs. The paper mentions PyTorch but does not specify a version number. |
| Experiment Setup | Yes | all models are trained with the initial learning rate of 1e 3 and the batch size of 8 on VE-Bench DB for 60 epochs. Following DOVER (Wu et al. 2023a), we first fine-tuning the head for 40 epochs with linear probing, and then train all parameters for another 20 epochs. Adam (Kingma and Ba 2014) optimizer and a cosine scheduler are applied during training. |