reproducibilityindex.ai

SECap: Speech Emotion Captioning with Large Language Model

Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shi-Xiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results of objective and subjective evaluations demonstrate that: 1) the SECap framework outperforms the HTSAT-BART baseline in all objective evaluations; 2) SECap can generate high-quality speech emotion captions that attain performance on par with human annotators in subjective mean opinion score tests.
Researcher Affiliation	Collaboration	1Shenzhen International Graduate Graduate School, Tsinghua University, Shenzhen, China 2Tencent AI Lab 3The Chinese University of Hong Kong, Hong Kong SAR, China
Pseudocode	No	The paper describes the model architecture and training processes with mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1Codes, models and results: https://github.com/thuhcsi/SECap
Open Datasets	Yes	Due to the lack of publicly available SEC datasets, we utilize an internal dataset called EMOSpeech. ... Please refer to project s Git Hub repository for detailed dataset construction process, where the test set is also publicly available.
Dataset Splits	Yes	Upon constructing the EMOSpeech dataset, we randomly select 600 sentences for testing, 600 sentences for validation, and the remaining 29,326 sentences for training2.
Hardware Specification	No	The paper does not specify any particular hardware components (e.g., specific GPU/CPU models, memory amounts) used for running experiments.
Software Dependencies	No	The paper mentions software components and cites associated research papers (e.g., 'LLa MA (Cui, Yang, and Yao 2023)', 'Hu BERT (Hsu et al. 2021)', 'BERT-base (Devlin et al. 2019)'), but it does not provide explicit version numbers for the underlying software libraries or frameworks (e.g., PyTorch version, Python version).
Experiment Setup	No	The paper describes the general training process (two-stage, frozen parameters, model initialization) and mentions pre-trained models. However, it does not explicitly state specific hyperparameter values such as learning rates, batch sizes, or total training epochs within the main text, noting that 'Specific experimental details are given at Git Hub repository.'