reproducibilityindex.ai

Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Automatic and human evaluations on the Daily Talk dataset demonstrate that our approach effectively generates natural-sounding spoken responses, surpassing previous and cascaded baselines.
Researcher Affiliation	Collaboration	1Data Science and AI Lab, Department of ECE, Seoul National University 2NAVER Cloud 3NAVER AI Lab 4Artificial Intelligence Institute, Seoul National University 5ASRI, INMC, ISRC, and Interdisciplinary Program in AI, Seoul National University
Pseudocode	No	The paper does not include a figure, block, or section explicitly labeled "Pseudocode" or "Algorithm". Figure 5 shows a fine-tuning template, not pseudocode.
Open Source Code	Yes	Our code and checkpoints are available at https://github.com/naverai/usdm.
Open Datasets	Yes	Daily Talk [70]
Dataset Splits	Yes	We follow the train/test split of Lee et al. [70] and preprocess the data for single-turn spoken dialog. As a result, we obtain a total of 20,117 training samples and 1,058 test samples.
Hardware Specification	Yes	64 NVIDIA A100-40GB GPUs
Software Dependencies	No	The paper mentions several models and tools, some with links to their repositories (Table 6), but it does not provide specific version numbers for general software components like Python, PyTorch, or CUDA, which are typically required for full reproducibility.
Experiment Setup	Yes	batch size of 256. We use the Adam optimizer [73] with a learning rate of 10 4.