Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed VQScore and enhancement model are competitive with supervised baselines.
Researcher Affiliation Collaboration 1 NVIDIA, 2 Research Center for Information Technology Innovation, Academia Sinica
Pseudocode No The paper describes its methods through text and diagrams (Figure 1) but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The code and pre-trained models will be released.
Open Datasets Yes The training data used to train our VQ-VAE for quality estimation was the Libri Speech clean 460 hours (Panayotov et al., 2015).
Dataset Splits Yes The Voice Bank-DEMAND noisy test set (Valentini-Botinhao et al., 2016) was selected as the validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details, or specific computer specifications) used to run its experiments.
Software Dependencies No The paper mentions some software components like TorchAudio and Whisper ASR, but it does not provide specific version numbers for these or other key software dependencies required to replicate the experiments.
Experiment Setup Yes The model structure is shown in Figure 1, where the codebook size V and code dimension d are set to (2048, 32) and (c1, c2)=(128, 64). ...the commitment weight β was set to 1.0 and 3.0 for quality estimation and SE, respectively, based on the performance on validation set.