SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines

Authors: Shizun Wang, Weihong Zeng, Xu Wang, Hao Yang, Li Chen, Chuang Zhang, Ming Wu, Yi Yuan, Yunzhao Zeng, Min Zheng, Jing Liu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the effectiveness and efficiency of Swift Avatar on two different avatar engines. The superiority and advantageous flexibility of Swift Avatar are also verified in both subjective and objective evaluations.
Researcher Affiliation Collaboration 1 Beijing University of Posts and Telecommunications 2 Douyin Vision {wangshizun, zhangchuang, wuming}@bupt.edu.cn {zengweihong, wangxu.ailab, yang.hao, chenli.phd, yuanyi.cv, zengyunzhao, zhengmin.666, jing.liu}@bytedance.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the described methodology or a direct link to a code repository.
Open Datasets Yes For evaluation, we choose 116 images from FFHQ dataset (Karras, Laine, and Aila 2019)... Pretrained Semantic Style GAN on Celeb AMask-HQ (Lee et al. 2020) is directly used as realistic generator Greal...
Dataset Splits No The paper mentions using a 'training stage' for the avatar estimator and an 'evaluation dataset' for human rating, but it does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) to reproduce data partitioning.
Hardware Specification Yes We implement our methods using Py Torch 1.10 library and perform all experiments on NVIDIA V100 GPUs.
Software Dependencies Yes We implement our methods using Py Torch 1.10 library and perform all experiments on NVIDIA V100 GPUs.
Experiment Setup Yes Batch size is set to 16, style mixing probability (Karras, Laine, and Aila 2019) is set to 0.3. λR1, λpath are set to 10 and 0.5 separately. Lazy regularization (Karras et al. 2020) is applied every 16 mini-batches for discriminator (R1 regularization) and every 4 mini-batches for generator (path length regularization). All the images used for generators are aligned and resized to resolution 512 512. The optimization-based GAN inversion approach employs Adam (Kingma and Ba 2014) optimizer in the paired data production stage, and the learning rate initially follows co-sine annealing with 0.1. We optimize 200 steps for all latent codes, and λi, λp, λl are set to 0.1, 1 and 1, respectively... For semantic augmentation, we generate 10 augmented images for each latent code, using randomly generated noise in W space. We set λaug to 1 for the background... and also set λaug to 0.3, 0.06 for the hair part and glasses part... In the avatar estimator training stage, the input images of avatar estimator are resized to 224 224. We use the Adam optimizer with batch size 256 to train 100 epochs. The learning rate is set to 1e 3, and decayed by half per 30 epochs.