PGSS: Pitch-Guided Speech Separation
Authors: Xiang Li, Yiwen Wang, Yifan Sun, Xihong Wu, Jing Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures. |
| Researcher Affiliation | Academia | School of Intelligence Science and Technology, Peking University, Beijing, China chenj@cis.pku.edu.cn |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1, Figure 2, Figure 3) but no explicit pseudocode or algorithm blocks with numbered steps or code-like formatting. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the described method, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | The proposed framework is evaluated on Wall Street Journal (WSJ0) corpus. The WSJ0-2mix and -3mix datasets are the benchmarks designed for speech separation, introduced by (Hershey et al. 2016). |
| Dataset Splits | Yes | For WSJ0-2mix, the 30h training set and the 10h validation set contain two-speaker mixtures generated by randomly selecting speakers and utterances from the WSJ0 training set si_tr_s, and mixing them at various Signal-to-Noise Ratios (SNRs) uniformly chosen between 0 d B and 5 d B. The 5h test set was similarly generated using utterances from 18 speakers from the WSJ0 validation set si_dt_05 and evaluation set si_et_05. |
| Hardware Specification | No | The paper mentions support from 'the High-performance Computing Platform of Peking University' but does not specify any exact GPU/CPU models, processor types, or memory details used for running experiments. |
| Software Dependencies | No | The paper mentions using 'Praat (Boersma 2001)' for reference pitch extraction but does not provide specific version numbers for any key software components, libraries, or frameworks used in the implementation of their models. |
| Experiment Setup | Yes | The input magnitudes are computed from STFT with 25 ms window length, 10 ms hop size, and the Hann window. We quantize the frequency range from 60 to 404 Hz into 67 bins using 24 bins per octave in a logarithmic scale. |