Towards Discriminative Representation Learning for Speech Emotion Recognition
Authors: Runnan Li, Zhiyong Wu, Jia Jia, Yaohua Bu, Sheng Zhao, Helen Meng
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on public emotional benchmark database IEMOCAP and a tremendous realistic interaction database demonstrate the outperformance of the proposed SER framework, with 6.6% to 26.7% relative improvement on unweighted accuracy compared to state-of-the-art techniques. |
| Researcher Affiliation | Collaboration | 1Graduate School at Shenzhen, Tsinghua University 2Dept. of Computer Science and Technology, Tsinghua University 3Search Technology Center Asia (STCA), Microsoft 4Dept. of Systems Engineering and Engineering Management, The Chinese University of Hong Kong |
| Pseudocode | No | The paper describes its methods through textual explanations and architectural diagrams (Figure 1, 2, 4), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementations of this work are shared on the public website1. 1https://github.com/thuhcsi/IJCAI2019-DRL4SER/ |
| Open Datasets | Yes | The public emotion benchmark database IEMOCAP [Busso et al., 2008] and real scene database RID are used in the experiments for performance evaluation. |
| Dataset Splits | Yes | Both IEMOCAP and RID are randomly shifted and divided into three partitions with a proportion of 8:1:1 for training, validation and testing. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models, CPU types, or detailed computing infrastructure. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' as the deep learning framework and 'Adam' algorithm as the optimizer, but it does not specify version numbers for these software components or any other ancillary libraries. |
| Experiment Setup | Yes | In the proposed framework, filters employed in the residual convolutional layers are depicted in Fig.2, each Multi-head Self-attention block has 4 parallel heads, and each LSTM contains 256 units. The iteration times I in GCA-LSTM is empirically set at 3. The emotion classifier is constructed with three stacked dense layers, each contains 256 units. The initial learning rate of training is 10 3. ... trained by stochastic optimization with 128 samples per batch. |