ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

Authors: Yicheng Zhong, Huawei Wei, Peiji Yang, Zhisheng Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments illustrate that our method accomplishes expressive facial animation generation and offers enhanced flexibility in effectively conveying the desired style.
Researcher Affiliation Industry 1 Tencent Technology (Shenzhen) Co.Ltd {ajaxzhong, huaweiwei, peijiyang, plorywang}@tencent.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes MEAD is a talking-face video corpus featuring 60 actors talking with 8 different emotions at 3 different intensity levels. ... (Wang et al. 2020) ... BEAT comprises 76 hours of speech data, paired with 52D facial blendshape weights. ... (Liu et al. 2022)
Dataset Splits No For TEAD and MEAD-3D, the paper states, 'We use 90% of the data for training and the remaining 10% for testing'. It does not specify a separate validation split or its size for any of the datasets used.
Hardware Specification Yes The entire framework is trained using the Adam optimizer (Kingma and Ba 2014) on a single A100 GPU.
Software Dependencies No The paper mentions 'Our framework is implemented by Pytorch(Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or other key libraries used (e.g., CLIP-Vi TB/32, Adam optimizer).
Experiment Setup Yes Exp CLIP is trained with a learning rate of 1e-5 and a batch size of 256. ... An 8-layer transformer decoder is used... Each training sample has a duration of 64 frames with FPS=15.