Robust Video Portrait Reenactment via Personalized Representation Quantization

Authors: Kaisiyuan Wang, Changcheng Liang, Hang Zhou, Jiaxiang Tang, Qianyi Wu, Dongliang He, Zhibin Hong, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments have been conducted to validate the effectiveness of our approach.
Researcher Affiliation Collaboration Kaisiyuan Wang1, Changcheng Liang2, Hang Zhou3*, Jiaxiang Tang4, Qianyi Wu5, Dongliang He3, Zhibin Hong3, Jingtuo Liu3, Errui Ding3, Ziwei Liu6, Jingdong Wang3 1The University of Sydney 2Xidian University 3Baidu Inc. 4Peking University 5Monash University 6S-Lab, Nanyang Technological University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We evaluate our methods on eight video sequences including five videos from the HDTF (Zhang et al. 2021) dataset, one video from ADNerf (Guo et al. 2021) dataset, one video from LSP (Lu, Chai, and Cao 2021) dataset and one video from Nerface (Gafni et al. 2021) dataset.
Dataset Splits No The paper mentions training and testing but does not explicitly detail the training/validation/test dataset splits with percentages or counts for reproducibility of data partitioning. It mentions '1000 frames from the test set of each subject' but not the overall dataset size or other splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'implemented on Py Torch' but does not specify software versions for PyTorch or other dependencies.
Experiment Setup Yes All experiments are implemented on Py Torch using Adam optimizer with an initial learning rate of 5e-4 and batch size of 4. Note that, as we adopt a temporal training strategy, the 4 images in a batch are consecutive frames collected from the same video clip. The training procedure is performed in a self-reenactment manner for both two stages. For both VQGAN (Esser, Rombach, and Ommer 2021) and Vi T (Dosovitskiy et al. 2021), we follow them to use their standard blocks.