reproducibilityindex.ai

SSDM: Scalable Speech Dysfluency Modeling

Authors: Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Paul Baquirin, Zachary Miller, Maria Luisa Gorno Tempini, Gopala Anumanchipalli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate phonetic transcription (forced alignment) performance using simulated data from VCTK++[1] and our proposed Libri-Dys dataset. The framewise F1 score and d PER[1] are used as evaluation metrics. Five types of training data are used: VCTK++, Libri TTS (100%, [106]), Libri-Dys (30%), Libri-Dys (60%), and Libri-Dys (100%).
Researcher Affiliation	Academia	Jiachen Lian1, Xuanru Zhou2, Zoe Ezzes3, Jet Vonk3, Brittany Morin3, David Baquirin3, Zachary Miller3, Maria Luisa Gorno Tempini3, Gopala Anumanchipalli1 1 UC Berkeley, 2 Zhejiang University, 3 UCSF
Pseudocode	Yes	Algorithm 1 Find Longest Common Subsequence (LCS)
Open Source Code	No	For code, we are waiting for the other approval.
Open Datasets	Yes	Data is opensourced at https://bit.ly/4ao Ld WU.
Dataset Splits	No	For training, we use VCTK++[1] and Libri-Dys datasets. For testing, we randomly sample 10% of the training data. The paper does not explicitly describe a separate validation split or how it's derived.
Hardware Specification	Yes	The training is conducted using two A6000 GPUs.
Software Dependencies	No	The paper mentions software like Wav LM, Glow algorithm, and Adam optimizer but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	In Eq. 2, τ = 2. In Eq. 4, a = b = 1, mrow = 3. In Eq. 6 and Eq. 7, we simply set K1 = K2 = 1. In Eq. 8, λ1 = λ2 = λ3 = 1. In Eq. 12 and Eq. 13, δ = 0.9. ... We use the Adam optimizer and decay the learning rate from 0.001 at a rate of 0.9 every 10 steps until convergence.