Vid2Game: Controllable Characters Extracted from Real-World Videos

Authors: Oran Gafni, Lior Wolf, Yaniv Taigman

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The method was tested on multiple video sequences, see the supplementary video (https:// youtu.be/s Np6Hskav BE). The first video shows a tennis player outdoors, the second video, a person swiping a sword indoors, and the third, a person walking. The part of the videos used for training consists of 5.5min, 3.5min, and 7.5min, respectively. In addition, for comparative purposes, we trained the P2F network on a three min video of a dancer, which was part of the evaluation done by Wang et al. (2018a). A comparison of the P2F network with the pix2pix HD method of Wang et al. (2018b) is provided in Tab. 1, and as a figure in appendix Fig. 14. We compare by Structural Similarity (SSIM) (Wang et al., 2004) and Learned Perceptual Image Patch Similarity (LPIPS) Zhang et al. (2018) distance methods. The mean and standard deviation are calculated for each generated video. An ablation study is presented in appendix D, showing the contribution of the various components of the system both quantitatively and qualitatively.
Researcher Affiliation Collaboration Oran Gafni Facebook AI Research oran@fb.com Lior Wolf Facebook AI Research & Tel Aviv Uni. wolf@fb.com Yaniv Taigman Facebook AI Research yaniv@fb.com
Pseudocode No The paper does not contain a dedicated section or figure labeled "Pseudocode" or "Algorithm".
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes In addition, for comparative purposes, we trained the P2F network on a three min video of a dancer, which was part of the evaluation done by Wang et al. (2018a).
Dataset Splits No The paper mentions selecting a "test set that was not used during training" but does not specify training, validation, and test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments (e.g., specific GPU/CPU models, memory details, or cloud instance types).
Software Dependencies No The paper mentions software components and frameworks like "Adam optimizer", "LSGAN", "VGG", "pix2pix HD framework", "Dense Pose framework", and "semantic segmentation method of Zhou et al. (2019)", but it does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes We use the Adam optimizer (Kingma & Ba, 2016) with a learning rate of 2 10 4, β1 = 0.5 and β2 = 0.999. The loss applied to the generator can then be formulated as: LLSk + λDLF Mk D + λV GGLF MV GG (8) where the networks are trained with λD = λV GG = 10. The P2F generator loss is formulated as: LLSk + λDLF Mk D + λ1LF MV GG + λ2Lmask (13) where λ1 = 10 and λ2 = 1. Hence, we train with = 2 inter-frame intervals (where = 1 describes using consecutive frames). During inference, we sample at 30fps and apply a directional conditioning signal that has half of the average motion magnitude during training.