reproducibilityindex.ai

Towards Discriminative Representation Learning for Speech Emotion Recognition

Authors: Runnan Li, Zhiyong Wu, Jia Jia, Yaohua Bu, Sheng Zhao, Helen Meng

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on public emotional benchmark database IEMOCAP and a tremendous realistic interaction database demonstrate the outperformance of the proposed SER framework, with 6.6% to 26.7% relative improvement on unweighted accuracy compared to state-of-the-art techniques.
Researcher Affiliation	Collaboration	1Graduate School at Shenzhen, Tsinghua University 2Dept. of Computer Science and Technology, Tsinghua University 3Search Technology Center Asia (STCA), Microsoft 4Dept. of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
Pseudocode	No	The paper describes its methods through textual explanations and architectural diagrams (Figure 1, 2, 4), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The implementations of this work are shared on the public website1. 1https://github.com/thuhcsi/IJCAI2019-DRL4SER/
Open Datasets	Yes	The public emotion benchmark database IEMOCAP [Busso et al., 2008] and real scene database RID are used in the experiments for performance evaluation.
Dataset Splits	Yes	Both IEMOCAP and RID are randomly shifted and divided into three partitions with a proportion of 8:1:1 for training, validation and testing.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models, CPU types, or detailed computing infrastructure.
Software Dependencies	No	The paper mentions 'Tensor Flow' as the deep learning framework and 'Adam' algorithm as the optimizer, but it does not specify version numbers for these software components or any other ancillary libraries.
Experiment Setup	Yes	In the proposed framework, ﬁlters employed in the residual convolutional layers are depicted in Fig.2, each Multi-head Self-attention block has 4 parallel heads, and each LSTM contains 256 units. The iteration times I in GCA-LSTM is empirically set at 3. The emotion classiﬁer is constructed with three stacked dense layers, each contains 256 units. The initial learning rate of training is 10 3. ... trained by stochastic optimization with 128 samples per batch.