Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
Authors: Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several different tasks including speech separation and multi-speaker speech recognition show that our conditional multi-sequence models lead to consistent improvements over the conventional non-conditional models. |
| Researcher Affiliation | Collaboration | 1Center for Language and Speech Processing, Johns Hopkins University, U.S.A 2Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China 3ASLP@NPU, School of Computer Science, Northwestern Polytechnical University, Xi an, China 4Hitachi, Ltd. Research & Development Group, Japan |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code and Supplementary Material could be available on our webpage: https://demotoshow.github.io/. |
| Open Datasets | Yes | For the speech mixtures, i.e., the input O for our tasks, with different numbers of speakers, data from the Wall Street Journal (WSJ) corpus is used. In the two-speaker scenario, we use the common benchmark called WSJ0-2mix dataset introduced in [15]. |
| Dataset Splits | Yes | The 30 h training set and the 10 h validation set contains two-speaker mixtures generated by randomly selecting speakers and utterances from the WSJ0 training set si_tr_s, and mixing them at various signal-to-noise ratios (SNRs) uniformly chosen between 0 d B and 10 d B. The 5 h test set was similarly generated using utterances from 18 speakers from the WSJ0 validation set si_dt_05 and evaluation set si_et_05. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like CTC and refers to ESPnet, but does not provide specific version numbers for software dependencies (e.g., library or solver names with versions). |
| Experiment Setup | No | In the Section A of Supplementary Material, we provide the implementation details about all our experiments, and we also extend our model to one iterative speech denoising task in Section D. |