Generalizable Implicit Motion Modeling for Video Frame Interpolation

Authors: Zujin Guo, Wei Li, Chen Change Loy

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present quantitative and qualitative evaluations of our motion modeling method GIMM in Section 4.1, and the corresponding interpolation method (GIMM-VFI) in Section 4.4. Specifically, we evaluate both motion quality and performance on the downstream interpolation task. We compare GIMM-VFI with current state-of-the-art VFI methods on arbitrary-timestep interpolation.
Researcher Affiliation Academia Zujin Guo, Wei Li, Chen Change Loy S-Lab, Nanyang Technological University {zujin.guo, wei.l, ccloy}@ntu.edu.sg
Pseudocode No The paper includes architectural diagrams (Figure 6, 7, 8) but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No We are unable to provide our code upon submission, but releasing the code to the public in the future is our plan.
Open Datasets Yes We train the GIMM model on the training split of Vimeo90K [54] triplets dataset using optical flows extracted by off-the-shelf flow estimators.
Dataset Splits No We train the GIMM model on the training split of Vimeo90K [54] triplets dataset... Our GIMM-VFI is trained on the complete Vimeo90K septuplet dataset. Specifically, we implement two variants of GIMM-VFI, using two different flow estimators: the RAFT [50] and Flow Former [19], designated as GIMM-VFI-R and GIMM-VFI-F, respectively. However, both versions of GIMM-VFI share the same training process. Similar to previous works [55, 20], we train our model on the complete Vimeo90K septuplet split [54] for 60 epochs with a batch size of 32 and a learning rate of 8 10 5. We randomly select triple subsets for training from each septuplet, following the same sampling strategy as previous research [55, 20].
Hardware Specification Yes using 2 NVIDIA V100 GPUs. [...] We train our model on 8 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions software components like 'Adam W' optimizer, 'Cosine annealing' learning rate schedule, and 'PReLU function', and references 'RAFT [50]' and 'Flow Former [19]' as flow estimators, but does not provide specific version numbers for any of these.
Experiment Setup Yes We train the GIMM model on the training split of Vimeo90K [54] triplets dataset using optical flows extracted by off-the-shelf flow estimators. [...] randomly cropping the flows to a resolution of 256 256. For each batch during training, we randomly select a timestep t from the set {0, 0.5, 1} to supervise. We set the batch size to 64, and train the model for 240 epochs with a learning rate of 1 10 4. [...] We resize and randomly crop each frame into a resolution of 224 224 and perform a series of augmentations including rotation, flipping, temporal order reversing and channel order reversing. [...] for 60 epochs with a batch size of 32 and a learning rate of 8 10 5. Table 5 also provides: Optimizer Adam W, Peak learning rate, Minimum learning rate, Epochs, Batch size per GPU, Weight decay, Optimizer momentum β1, β2 = 0.9, 0.999, Learning rate schedule 55 Cosine annealing, Warmip epochs, Training Resolution.