reproducibilityindex.ai

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Authors: Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions.
Researcher Affiliation	Collaboration	1Inner Mongolian University, China 2Byte Dance 3Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong, Shenzhen, China 4National University of Singapore, Singapore
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code	Yes	Code and audio samples can be found at: https://github.com/walkerhyf/ECSS.
Open Datasets	Yes	We validate the ECSS on a recently public dataset for conversational speech synthesis called Daily Talk (Lee, Park, and Kim 2023)
Dataset Splits	Yes	We partition the data into training, validation, and test sets at a ratio of 8:1:1.
Hardware Specification	Yes	The model is trained on a Tesla V100 GPU with a batch size of 16 and 600k steps.
Software Dependencies	No	The paper mentions using pre-trained models like BERT and HiFi-GAN, and a G2P toolkit, but does not provide specific version numbers for these software components or any programming language and library versions (e.g., Python, PyTorch).
Experiment Setup	Yes	In the heterogeneous graph-based emotion context encoder, the dimension of the text node representation fuj is set to 512, and the dimensions of the remaining type node representations fej,fij,fsj, and faj are all set to 256. For multi-head attention-based methods, we set the head number as 8. ... The model is trained on a Tesla V100 GPU with a batch size of 16 and 600k steps. ... More detailed experimental settings are accessed in the Appendix section.