SECap: Speech Emotion Captioning with Large Language Model
Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shi-Xiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of objective and subjective evaluations demonstrate that: 1) the SECap framework outperforms the HTSAT-BART baseline in all objective evaluations; 2) SECap can generate high-quality speech emotion captions that attain performance on par with human annotators in subjective mean opinion score tests. |
| Researcher Affiliation | Collaboration | 1Shenzhen International Graduate Graduate School, Tsinghua University, Shenzhen, China 2Tencent AI Lab 3The Chinese University of Hong Kong, Hong Kong SAR, China |
| Pseudocode | No | The paper describes the model architecture and training processes with mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Codes, models and results: https://github.com/thuhcsi/SECap |
| Open Datasets | Yes | Due to the lack of publicly available SEC datasets, we utilize an internal dataset called EMOSpeech. ... Please refer to project s Git Hub repository for detailed dataset construction process, where the test set is also publicly available. |
| Dataset Splits | Yes | Upon constructing the EMOSpeech dataset, we randomly select 600 sentences for testing, 600 sentences for validation, and the remaining 29,326 sentences for training2. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., specific GPU/CPU models, memory amounts) used for running experiments. |
| Software Dependencies | No | The paper mentions software components and cites associated research papers (e.g., 'LLa MA (Cui, Yang, and Yao 2023)', 'Hu BERT (Hsu et al. 2021)', 'BERT-base (Devlin et al. 2019)'), but it does not provide explicit version numbers for the underlying software libraries or frameworks (e.g., PyTorch version, Python version). |
| Experiment Setup | No | The paper describes the general training process (two-stage, frozen parameters, model initialization) and mentions pre-trained models. However, it does not explicitly state specific hyperparameter values such as learning rates, batch sizes, or total training epochs within the main text, noting that 'Specific experimental details are given at Git Hub repository.' |