Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FrameBridge: Improving Image-to-Video Generation with Bridge Models

Authors: Yuji Wang, Zehua Chen, Chen Xiaoyu, Yixiang Wei, Jun Zhu, Jianfei Chen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on Web Vid-2M and UCF-101 demonstrate the superior quality of Frame Bridge in comparison with the diffusion counterpart (zero-shot FVD 95 vs. 192 on MSR-VTT and non-zero-shot FVD 122 vs. 171 on UCF-101), and the advantages of our proposed SAF and neural prior for bridge-based I2V models.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University 2Sheng Shu, Beijing, China. Correspondence to: Jianfei Chen <EMAIL>.
Pseudocode Yes Algorithm 1 Training algorithms for I2V diffusion models. Algorithm 2 Sampling algorithms for Frame Bridge. Algorithm 3 Sampling algorithms for I2V diffusion models. Algorithm 4 Training algorithms for Frame Bridge.
Open Source Code No The project page: https: //framebridge-icml.github.io/. The text mentions a project page, which is often a high-level overview or demonstration page rather than a direct link to a code repository, and does not explicitly state that code is provided.
Open Datasets Yes Experiments conducted on Web Vid-2M (Bain et al., 2021) and UCF-101 (Soomro, 2012) demonstrate the superior quality of Frame Bridge... zero-shot FVD 95 vs. 192 on MSR-VTT (Xu et al., 2016)
Dataset Splits Yes UCF-101 is an open-sourced video dataset consisting of 13320 videos clips, and each video clip are categorized into one of the 101 action classes. There are three official train-test split, each of which divide the whole dataset into 9537 training video clips and 3783 test video clips.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. It mentions 'computational resources' but no explicit specifications.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'BFloat16' (a data type), but does not provide specific version numbers for any ancillary software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We fine-tune the models ϵ ˆΨ for 20k iterations or 100k iterations with batch size 64. We use the Adam W optimizer with learning rate 1 10 5 and mixed precision of BFloat16.