Channel Attention Is All You Need for Video Frame Interpolation
Authors: Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, Kyoung Mu Lee10663-10671
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation. |
| Researcher Affiliation | Collaboration | Myungsub Choi,1 Heewon Kim,1 Bohyung Han,1 Ning Xu,2 Kyoung Mu Lee1 1Computer Vision Lab. & ASRI, Seoul National University, 2Amazon Go {cms6539, ghimhw, bhhan, kyoungmu}@snu.ac.kr, ninxu@amazon.com |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | The source code for our framework is made public along with the pretrained models to facilitate reproduction.2 https://github.com/myungsub/CAIN |
| Open Datasets | Yes | We evaluate our model on three benchmark datasets commonly used in the recent works (Jiang et al. 2018; Liu et al. 2017; Niklaus and Liu 2018; Niklaus, Mai, and Liu 2017b; Xue et al. 2018): Middlebury optical flow (Baker et al. 2010), UCF101 (Soomro, Zamir, and Shah 2012), and Vimeo90K (Xue et al. 2018). |
| Dataset Splits | Yes | We use the training split of Vimeo90K (Xue et al. 2018) dataset for training... The initial learning rate is 0.0001, which is reduced by a factor of 2 whenever the validation loss stops decreasing for more than 5 epochs. Our evaluation benchmark has four different settings Easy, Medium, Hard, and Extreme depending on the temporal gap between two input frames. |
| Hardware Specification | Yes | A full training of our network takes about 4 days on a single Titan Xp GPU. |
| Software Dependencies | No | Our algorithm is implemented in Py Torch. The version number for PyTorch is not specified. |
| Experiment Setup | Yes | We use the training split of Vimeo90K (Xue et al. 2018) dataset for training, where our model is optimized by Adam (Kingma and Ba 2014) for 200 epochs (approximately 320K iterations); training is based on 256 256 patches and the batch size is 32. Random vertical and horizontal flipping along with random temporal order swapping between two input frames are adopted for data augmentation. The initial learning rate is 0.0001, which is reduced by a factor of 2 whenever the validation loss stops decreasing for more than 5 epochs. We clip the gradient norm to be less than 0.1, which handles the gradient explosion issue. |