Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed VQScore and enhancement model are competitive with supervised baselines. |
| Researcher Affiliation | Collaboration | 1 NVIDIA, 2 Research Center for Information Technology Innovation, Academia Sinica |
| Pseudocode | No | The paper describes its methods through text and diagrams (Figure 1) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The code and pre-trained models will be released. |
| Open Datasets | Yes | The training data used to train our VQ-VAE for quality estimation was the Libri Speech clean 460 hours (Panayotov et al., 2015). |
| Dataset Splits | Yes | The Voice Bank-DEMAND noisy test set (Valentini-Botinhao et al., 2016) was selected as the validation set. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details, or specific computer specifications) used to run its experiments. |
| Software Dependencies | No | The paper mentions some software components like TorchAudio and Whisper ASR, but it does not provide specific version numbers for these or other key software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | The model structure is shown in Figure 1, where the codebook size V and code dimension d are set to (2048, 32) and (c1, c2)=(128, 64). ...the commitment weight β was set to 1.0 and 3.0 for quality estimation and SE, respectively, based on the performance on validation set. |