Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Automatic and human evaluations on the Daily Talk dataset demonstrate that our approach effectively generates natural-sounding spoken responses, surpassing previous and cascaded baselines.
Researcher Affiliation Collaboration 1Data Science and AI Lab, Department of ECE, Seoul National University 2NAVER Cloud 3NAVER AI Lab 4Artificial Intelligence Institute, Seoul National University 5ASRI, INMC, ISRC, and Interdisciplinary Program in AI, Seoul National University
Pseudocode No The paper does not include a figure, block, or section explicitly labeled "Pseudocode" or "Algorithm". Figure 5 shows a fine-tuning template, not pseudocode.
Open Source Code Yes Our code and checkpoints are available at https://github.com/naverai/usdm.
Open Datasets Yes Daily Talk [70]
Dataset Splits Yes We follow the train/test split of Lee et al. [70] and preprocess the data for single-turn spoken dialog. As a result, we obtain a total of 20,117 training samples and 1,058 test samples.
Hardware Specification Yes 64 NVIDIA A100-40GB GPUs
Software Dependencies No The paper mentions several models and tools, some with links to their repositories (Table 6), but it does not provide specific version numbers for general software components like Python, PyTorch, or CUDA, which are typically required for full reproducibility.
Experiment Setup Yes batch size of 256. We use the Adam optimizer [73] with a learning rate of 10 4.