VFIMamba: Video Frame Interpolation with State Space Models

Authors: Guozhen Zhang, Chuxnu Liu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental findings showcase that our method attains state-of-the-art performance across diverse benchmarks, particularly excelling in high-resolution scenarios. and 4 Experiments
Researcher Affiliation Collaboration Guozhen Zhang1,2 Chunxu Liu1 Yutao Cui2 Xiaotong Zhao2 Kai Ma2 Limin Wang1,3 1State Key Laboratory for Novel Software Technology, Nanjing University 2Platform and Content Group (PCG), Tencent 3Shanghai AI Lab
Pseudocode No The paper includes architectural diagrams (Figures 2, 3, 9, 10) but no structured pseudocode or algorithm blocks.
Open Source Code No We have no plans to open-source the source code at this time, but the training code and models will be open-sourced after publication.
Open Datasets Yes The low-resolution datasets include Vimeo90K (448 256) (Xue et al., 2019), UCF101 (256 256) (Soomro et al., 2012), and SNU-FILM (1280 720) (Reda et al., 2022). ... The high-resolution datasets include X-TEST (Sim et al., 2021), X-TEST-L (a more challenging subset selected by Liu et al. (2024a)), and Xiph (Montgomery, 1994).
Dataset Splits No The paper describes training data (Vimeo90K, X-TRAIN) and evaluation datasets (X-TEST, SNU-FILM, etc.), but does not explicitly detail specific validation dataset splits with percentages, sample counts, or predefined citations for their own model development and hyperparameter tuning.
Hardware Specification Yes VFIMamba and VFIMamba-S were both trained on 4 NVIDIA 32GB V100 GPUs. and In Runtime , we evaluate the inference speed of each method on 1024 1024 resolution inputs by a 2080Ti GPU.
Software Dependencies No The paper mentions 'Adam W as our optimizer' but does not provide specific software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes Training loss We used the same training loss as Zhang et al. (2023), which is a weighted combination of Laplacian loss (Niklaus & Liu, 2020) and warp loss (Liu et al., 2019), with weights of 1 and 0.5, respectively. Training setting For the data from Vimeo90K (Xue et al., 2019), we randomly cropped the frames from 256 448 to 256 256. For the data from X-TRAIN (Sim et al., 2021), ... The batch size for Vimeo90K is 32, and for X-TRAIN it is 8. We used Adam W as our optimizer with β1 = 0.9, β2 = 0.999, and a weight decay of 1 10 4. With warmup for 2,000 steps, the learning rate was gradually increased to 2 10 4, and then we used cosine annealing for 300 epochs to reduce the learning rate from 2 10 4 to 2 10 5.