Controllable 3D Face Generation with Conditional Style Code Diffusion

Authors: Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on FFHQ, Celeb A-HQ, and Celeb A-Dialog demonstrate the promising performance of our TEx-Face in achieving the efficient and controllable generation of photorealistic 3D faces.
Researcher Affiliation Collaboration Xiaolong Shen1,2*, Jianxin Ma2, Chang Zhou2, Zongxin Yang1 1 Re LER, CCAI, Zhejiang University, China 2 Alibaba Group, China {sxlongcs, zongxinyang}@zju.edu.cn, {majx13fromthu,ericzhou.zc}@alibaba-inc.com *Xiaolong Shen worked on this at his Alibaba internship.
Pseudocode No The paper describes the proposed methods in detail but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code will be available at https://github.com/sxl142/TEx-Face.
Open Datasets Yes We train our inversion model on FFHQ (Abdal, Qin, and Wonka 2019) and test it on Celeba A-HQ (Karras et al. 2018) test set. We use Celeb A-Dialog (Jiang et al. 2021) and some data processed by our proposed data augmentation strategy to train our diffusion model.
Dataset Splits No The paper mentions training and test sets but does not explicitly describe a validation set or specific split percentages for dataset partitioning (e.g., 80/10/10 split or specific sample counts for training, validation, and test).
Hardware Specification Yes We use four Nvidia Tesla V100 (16G) with batch size 8 to train our inversion model, and with batch size 256 for style code diffusion.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'Cosine Annealing scheduler' but does not specify version numbers for key software dependencies like programming languages (e.g., Python 3.8) or deep learning frameworks (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes We use Adam (Kingma and Ba 2017) optimizer with linear warm-up and Cosine Annealing (Loshchilov and Hutter 2017) scheduler. We set the loss weights as follows: λrec = 1, λlpips = 0.8, λid = 0.2. We use four Nvidia Tesla V100 (16G) with batch size 8 to train our inversion model, and with batch size 256 for style code diffusion.