ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
Authors: Yicheng Zhong, Huawei Wei, Peiji Yang, Zhisheng Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments illustrate that our method accomplishes expressive facial animation generation and offers enhanced flexibility in effectively conveying the desired style. |
| Researcher Affiliation | Industry | 1 Tencent Technology (Shenzhen) Co.Ltd {ajaxzhong, huaweiwei, peijiyang, plorywang}@tencent.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | MEAD is a talking-face video corpus featuring 60 actors talking with 8 different emotions at 3 different intensity levels. ... (Wang et al. 2020) ... BEAT comprises 76 hours of speech data, paired with 52D facial blendshape weights. ... (Liu et al. 2022) |
| Dataset Splits | No | For TEAD and MEAD-3D, the paper states, 'We use 90% of the data for training and the remaining 10% for testing'. It does not specify a separate validation split or its size for any of the datasets used. |
| Hardware Specification | Yes | The entire framework is trained using the Adam optimizer (Kingma and Ba 2014) on a single A100 GPU. |
| Software Dependencies | No | The paper mentions 'Our framework is implemented by Pytorch(Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or other key libraries used (e.g., CLIP-Vi TB/32, Adam optimizer). |
| Experiment Setup | Yes | Exp CLIP is trained with a learning rate of 1e-5 and a batch size of 256. ... An 8-layer transformer decoder is used... Each training sample has a duration of 64 frames with FPS=15. |