VFIMamba: Video Frame Interpolation with State Space Models
Authors: Guozhen Zhang, Chuxnu Liu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental findings showcase that our method attains state-of-the-art performance across diverse benchmarks, particularly excelling in high-resolution scenarios. and 4 Experiments |
| Researcher Affiliation | Collaboration | Guozhen Zhang1,2 Chunxu Liu1 Yutao Cui2 Xiaotong Zhao2 Kai Ma2 Limin Wang1,3 1State Key Laboratory for Novel Software Technology, Nanjing University 2Platform and Content Group (PCG), Tencent 3Shanghai AI Lab |
| Pseudocode | No | The paper includes architectural diagrams (Figures 2, 3, 9, 10) but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | We have no plans to open-source the source code at this time, but the training code and models will be open-sourced after publication. |
| Open Datasets | Yes | The low-resolution datasets include Vimeo90K (448 256) (Xue et al., 2019), UCF101 (256 256) (Soomro et al., 2012), and SNU-FILM (1280 720) (Reda et al., 2022). ... The high-resolution datasets include X-TEST (Sim et al., 2021), X-TEST-L (a more challenging subset selected by Liu et al. (2024a)), and Xiph (Montgomery, 1994). |
| Dataset Splits | No | The paper describes training data (Vimeo90K, X-TRAIN) and evaluation datasets (X-TEST, SNU-FILM, etc.), but does not explicitly detail specific validation dataset splits with percentages, sample counts, or predefined citations for their own model development and hyperparameter tuning. |
| Hardware Specification | Yes | VFIMamba and VFIMamba-S were both trained on 4 NVIDIA 32GB V100 GPUs. and In Runtime , we evaluate the inference speed of each method on 1024 1024 resolution inputs by a 2080Ti GPU. |
| Software Dependencies | No | The paper mentions 'Adam W as our optimizer' but does not provide specific software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version). |
| Experiment Setup | Yes | Training loss We used the same training loss as Zhang et al. (2023), which is a weighted combination of Laplacian loss (Niklaus & Liu, 2020) and warp loss (Liu et al., 2019), with weights of 1 and 0.5, respectively. Training setting For the data from Vimeo90K (Xue et al., 2019), we randomly cropped the frames from 256 448 to 256 256. For the data from X-TRAIN (Sim et al., 2021), ... The batch size for Vimeo90K is 32, and for X-TRAIN it is 8. We used Adam W as our optimizer with β1 = 0.9, β2 = 0.999, and a weight decay of 1 10 4. With warmup for 2,000 steps, the learning rate was gradually increased to 2 10 4, and then we used cosine annealing for 300 epochs to reduce the learning rate from 2 10 4 to 2 10 5. |