reproducibilityindex.ai

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed VQScore and enhancement model are competitive with supervised baselines.
Researcher Affiliation	Collaboration	1 NVIDIA, 2 Research Center for Information Technology Innovation, Academia Sinica
Pseudocode	No	The paper describes its methods through text and diagrams (Figure 1) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The code and pre-trained models will be released.
Open Datasets	Yes	The training data used to train our VQ-VAE for quality estimation was the Libri Speech clean 460 hours (Panayotov et al., 2015).
Dataset Splits	Yes	The Voice Bank-DEMAND noisy test set (Valentini-Botinhao et al., 2016) was selected as the validation set.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details, or specific computer specifications) used to run its experiments.
Software Dependencies	No	The paper mentions some software components like TorchAudio and Whisper ASR, but it does not provide specific version numbers for these or other key software dependencies required to replicate the experiments.
Experiment Setup	Yes	The model structure is shown in Figure 1, where the codebook size V and code dimension d are set to (2048, 32) and (c1, c2)=(128, 64). ...the commitment weight β was set to 1.0 and 3.0 for quality estimation and SE, respectively, based on the performance on validation set.