First Order Motion Model for Image Animation

Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our framework scores best on diverse benchmarks and on a variety of object categories. Our source code is publicly available1. 4 Experiments Datasets. We train and test our method on four different datasets containing various objects. Table 1: Quantitative ablation study for video reconstruction on Tai-Chi-HD. Table 3: Video reconstruction: comparison with the state of the art on four different datasets.
Researcher Affiliation Collaboration Aliaksandr Siarohin DISI, University of Trento aliaksandr.siarohin@unitn.it Stéphane Lathuilière DISI, University of Trento LTCI, Télécom Paris, Institut polytechnique de Paris stephane.lathuilire@telecom-paris.fr Sergey Tulyakov Snap Inc. stulyakov@snap.com Elisa Ricci DISI, University of Trento Fondazione Bruno Kessler e.ricci@unitn.it Nicu Sebe DISI, University of Trento Huawei Technologies Ireland niculae.sebe@unitn.it
Pseudocode No The paper describes methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our source code is publicly available1. 1https://github.com/Aliaksandr Siarohin/first-order-model
Open Datasets Yes The Vox Celeb dataset [22] is a face dataset of 22496 videos, extracted from You Tube videos. The Uv A-Nemo dataset [9] is a facial analysis dataset that consists of 1240 videos. The BAIR robot pushing dataset [10] contains videos collected by a Sawyer robotic arm pushing diverse objects over a table. This dataset is referred to as the Tai-Chi-HD dataset. The dataset will be made publicly available.
Dataset Splits No For pre-processing, we extract an initial bounding box in the first video frame. Overall, we obtain 12331 training videos and 444 test videos, with lengths varying from 64 to 1024 frames. Similar to Wang et al. [38], we use 1116 videos for training and 124 for evaluation. It consists of 42880 training and 128 test videos. Finally, we obtain 3049 and 285 video chunks for training and testing respectively with video length varying from 128 to 1024 frames. The paper does not specify a distinct validation split or its size/percentage for any of the datasets.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions using a pre-trained VGG-19 network and U-Net architecture, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In all experiments we use K=10 as in [28]. In all our experiments, we employ σ = 0.01 following Jakab et al. [18]. The resolutions are 256 256, 128 128, 64 64 and 32 32. There are 20 loss terms in total. Therefore, we use equal loss weights in all our experiments.