Generalizable Implicit Motion Modeling for Video Frame Interpolation
Authors: Zujin Guo, Wei Li, Chen Change Loy
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present quantitative and qualitative evaluations of our motion modeling method GIMM in Section 4.1, and the corresponding interpolation method (GIMM-VFI) in Section 4.4. Specifically, we evaluate both motion quality and performance on the downstream interpolation task. We compare GIMM-VFI with current state-of-the-art VFI methods on arbitrary-timestep interpolation. |
| Researcher Affiliation | Academia | Zujin Guo, Wei Li, Chen Change Loy S-Lab, Nanyang Technological University {zujin.guo, wei.l, ccloy}@ntu.edu.sg |
| Pseudocode | No | The paper includes architectural diagrams (Figure 6, 7, 8) but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | We are unable to provide our code upon submission, but releasing the code to the public in the future is our plan. |
| Open Datasets | Yes | We train the GIMM model on the training split of Vimeo90K [54] triplets dataset using optical flows extracted by off-the-shelf flow estimators. |
| Dataset Splits | No | We train the GIMM model on the training split of Vimeo90K [54] triplets dataset... Our GIMM-VFI is trained on the complete Vimeo90K septuplet dataset. Specifically, we implement two variants of GIMM-VFI, using two different flow estimators: the RAFT [50] and Flow Former [19], designated as GIMM-VFI-R and GIMM-VFI-F, respectively. However, both versions of GIMM-VFI share the same training process. Similar to previous works [55, 20], we train our model on the complete Vimeo90K septuplet split [54] for 60 epochs with a batch size of 32 and a learning rate of 8 10 5. We randomly select triple subsets for training from each septuplet, following the same sampling strategy as previous research [55, 20]. |
| Hardware Specification | Yes | using 2 NVIDIA V100 GPUs. [...] We train our model on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Adam W' optimizer, 'Cosine annealing' learning rate schedule, and 'PReLU function', and references 'RAFT [50]' and 'Flow Former [19]' as flow estimators, but does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | We train the GIMM model on the training split of Vimeo90K [54] triplets dataset using optical flows extracted by off-the-shelf flow estimators. [...] randomly cropping the flows to a resolution of 256 256. For each batch during training, we randomly select a timestep t from the set {0, 0.5, 1} to supervise. We set the batch size to 64, and train the model for 240 epochs with a learning rate of 1 10 4. [...] We resize and randomly crop each frame into a resolution of 224 224 and perform a series of augmentations including rotation, flipping, temporal order reversing and channel order reversing. [...] for 60 epochs with a batch size of 32 and a learning rate of 8 10 5. Table 5 also provides: Optimizer Adam W, Peak learning rate, Minimum learning rate, Epochs, Batch size per GPU, Weight decay, Optimizer momentum β1, β2 = 0.9, 0.999, Learning rate schedule 55 Cosine annealing, Warmip epochs, Training Resolution. |