Video Inpainting by Jointly Learning Temporal Structure and Spatial Details

Authors: Chuan Wang, Haibin Huang, Xiaoguang Han, Jue Wang5232-5239

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide qualitative and quantitative evaluation on three datasets, demonstrating that our method outperforms previous learning-based video inpainting methods.
Researcher Affiliation Collaboration 1Megvii (Face++) {wangchuan, huanghaibin, wangjue}@megvii.com 2Shenzhen Research Inst. of Big Data, The Chinese University of Hong Kong, Shenzhen, China hanxiaoguang@cuhk.edu.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper provides a 'Project Page' URL (http://wangchuan.github.io/archive/research/videoinp/) but does not explicitly state that the source code for the methodology is available there, nor does it provide a direct link to a code repository.
Open Datasets Yes To validate our 3D-2D combined completion network, we tested on three datasets, Face Forensics (R ossler et al. 2018), 300VW (Chrysos et al. 2015) and Caltech (Doll ar et al. 2012).
Dataset Splits Yes For each dataset, we separate the whole data samples into training and validation sets and control their proportion 5 : 1.
Hardware Specification Yes The implementation is based on Tensor Flow and the network training is performed on a single NVIDIA Ge Force GTX 1080 Ti.
Software Dependencies No The paper mentions 'TensorFlow' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The Comb CN is trained with 100k iterations by an Adam optimizer, whose regression weight and learning rate are set to 0.01 and 0.001, respectively. Each frame is in 128^2 resolution, F, H, W, r are set to 32, 128, 128 and 2 respectively. We randomly generate a hole across all frames in the [0.375l, 0.5l] pixel range.