Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Authors: Xiaojuan Wang, Boyang Zhou, Brian Curless, Ira Kemelmacher-Shlizerman, Aleksander Holynski, Steve Seitz

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments shows that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques. We compare our work qualitatively and quantitatively to related methods on two curated difficult datasets targeted for generative inbetweening: Davis (Pont-Tuset et al., 2017) and Pexels1, and our method produces notably higher quality videos with more coherent dynamics given distant keyframes. Quantitative evaluation For each dataset, we evaluate the generated in-between videos using FID (Heusel et al., 2017) and FVD (Ge et al., 2024), widely used metrics for evaluating generative models. These two metrics measure the distance between the distributions of generated frames/videos and actual ones. The results are shown in Tab. 1, and our method outperforms all of the baselines by a significant margin.
Researcher Affiliation	Collaboration	1University of Washington, 2Google Deep Mind, 3UC Berkeley
Pseudocode	Yes	ALGORITHM 1: Light-weight backward motion fine-tuning Input: fθ, pdata(x), E( ) while not converged do... ALGORITHM 2: Dual-directional diffusion sampling Input: I0, IN 1, fθ, fθ , D( ) Compute condition c0, c N 1 from I0, IN 1; Set z T N(0, I); for t T to 1 do...
Open Source Code	No	The paper mentions using 'the public available model weights https://huggingface.co/stabilityai/ stable-video-diffusion-img2vid-xt' for Stable Video Diffusion, which is a base model. However, it does not provide any specific links or statements for the open-sourcing of their own methodology's implementation code.
Open Datasets	Yes	We use two high-resolution (1080p) datasets for evaluations: (1) The Davis dataset (Pont-Tuset et al., 2017), where we create a total of 117 input pairs from all of the videos. This dataset mostly features subject articulated motions, such as animal or human motions. (2) The Pexels dataset, where we collect a total of 106 input keyframe pairs from a compiled collection of high resolution videos on Pexels3, featuring directional dynamic scene motions such as vehicles moving, animals, or people running, surfing, wave movements, and time-lapse videos. (Footnote 3: https://www.pexels.com/)
Dataset Splits	No	The paper mentions that 'All input pairs are at least 25 frames apart and have the corresponding ground truth video clips.' and 'we create a total of 117 input pairs from all of the videos' for Davis and 'a total of 106 input keyframe pairs' for Pexels. However, it does not specify how these datasets were split into training, validation, or test sets for the experiments (e.g., percentages or exact counts for each split).
Hardware Specification	Yes	The training takes around 15K iterations with batch size of 4. We trained on 4 A100 GPUs.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' and 'Pytorch pseudocode', but does not provide specific version numbers for Python, Pytorch, CUDA, or any other libraries or frameworks used in the implementation.
Experiment Setup	Yes	We use the Adam optimizer with learning rate of 1e 4, β1 = 0.9, β2 = 0.999, and weight decay of 1e 2. The training takes around 15K iterations with batch size of 4. We trained on 4 A100 GPUs. For sampling, we apply 50 sampling steps. For other parameters in SVD, we use the default values: motion bucket id = 127, noise aug strength = 0.02.