Channel Attention Is All You Need for Video Frame Interpolation

Authors: Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, Kyoung Mu Lee10663-10671

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation.
Researcher Affiliation Collaboration Myungsub Choi,1 Heewon Kim,1 Bohyung Han,1 Ning Xu,2 Kyoung Mu Lee1 1Computer Vision Lab. & ASRI, Seoul National University, 2Amazon Go {cms6539, ghimhw, bhhan, kyoungmu}@snu.ac.kr, ninxu@amazon.com
Pseudocode No No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code Yes The source code for our framework is made public along with the pretrained models to facilitate reproduction.2 https://github.com/myungsub/CAIN
Open Datasets Yes We evaluate our model on three benchmark datasets commonly used in the recent works (Jiang et al. 2018; Liu et al. 2017; Niklaus and Liu 2018; Niklaus, Mai, and Liu 2017b; Xue et al. 2018): Middlebury optical flow (Baker et al. 2010), UCF101 (Soomro, Zamir, and Shah 2012), and Vimeo90K (Xue et al. 2018).
Dataset Splits Yes We use the training split of Vimeo90K (Xue et al. 2018) dataset for training... The initial learning rate is 0.0001, which is reduced by a factor of 2 whenever the validation loss stops decreasing for more than 5 epochs. Our evaluation benchmark has four different settings Easy, Medium, Hard, and Extreme depending on the temporal gap between two input frames.
Hardware Specification Yes A full training of our network takes about 4 days on a single Titan Xp GPU.
Software Dependencies No Our algorithm is implemented in Py Torch. The version number for PyTorch is not specified.
Experiment Setup Yes We use the training split of Vimeo90K (Xue et al. 2018) dataset for training, where our model is optimized by Adam (Kingma and Ba 2014) for 200 epochs (approximately 320K iterations); training is based on 256 256 patches and the batch size is 32. Random vertical and horizontal flipping along with random temporal order swapping between two input frames are adopted for data augmentation. The initial learning rate is 0.0001, which is reduced by a factor of 2 whenever the validation loss stops decreasing for more than 5 epochs. We clip the gradient norm to be less than 0.1, which handles the gradient explosion issue.