Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking

Authors: Zixiang Zhao, Haowen Bai, Bingxin Ke, Yukun Cui, Lilun Deng, Yulun Zhang, Kai Zhang, Konrad Schindler

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Uni VF achieves state-of-the-art results across all tasks on VF-Bench. Project page: vfbench.github.io. [...] 5 Experiments We now evaluate our Uni VF on VF-Bench for all four fusion scenarios: multi-exposure fusion (MEF), multi-focus fusion (MFF), infrared-visible fusion (IVF), and medical video fusion (MVF).
Researcher Affiliation Academia 1ETH Zürich 2Xi an Jiaotong University 3Shanghai Jiao Tong University 4Nanjing University
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations (e.g., Equations 1-9) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Project page: vfbench.github.io. [...] We have provided instructions for reproduction details in the main paper and Appendix. The dataset, code and pretrained models are publicly available to ensure faithful reproduction of the results.
Open Datasets Yes We introduce Video Fusion Benchmark (VF-Bench), the first comprehensive benchmark covering four video fusion tasks: multi-exposure, multi-focus, infrared-visible, and medical fusion. [...] The dataset, code and pretrained models are publicly available to ensure faithful reproduction of the results.
Dataset Splits Yes From >2000 candidates, we manually curated 500 scenes with an average of 150 frames, choosing those with rich visual content, vivid colors, and free from watermarks or video effects. These were further divided into 450 for training and 50 for testing. [...] Specifically, 150 videos are split into 120 training scenes and 30 test scenes, with an average length of 70 frames. [...] Through this process, we curate a total of 90 video scenes with, on average, 300 frames. These are randomly split into 75 training scenes and 15 testing scenes. [...] As a result, we curate a total of 57 scenes with 27 frames on average, which are divided into 49 for training and 8 for testing.
Hardware Specification Yes We ran our experiments on a machine equipped with a single NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies No The paper mentions using 'Adam' for optimization and 'Restormer blocks' for network architecture but does not specify version numbers for any software components (e.g., Python, PyTorch, CUDA) required for replication.
Experiment Setup Yes Training details. We ran our experiments on a machine equipped with a single NVIDIA Ge Force RTX 4090 GPU. The loss is minimized with Adam, starting with a learning rate of 10 4 that decays exponentially to 1% of its initial value over the course of 20k iterations. Training uses a batch size of 32, with gradient accumulation. As our network architecture, we adopt Restormer blocks [17] in both the encoder Ek( ) and decoder D( ) components. Each block contains 8 attention heads and has a feature dimension of 32. Both the encoders and decoder are configured with 4 stacked blocks. [...] L = Lspatial + α1Lgrad + α2Ltemp, (6) where {α1, α2} are weight parameters, set to {10, 2}, {1, 0.5}, {5, 2} and {1, 1} for MEF, MFF, IVF and MVF tasks respectively, such that the terms have comparable magnitudes. [...] ϵ is a predefined threshold (set to 1.0 in our implementation).