Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Authors: Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/Mc Govern Institute of Brain Research Tsinghua University, Beijing, China 2Department of Electrical Engineering, Columbia University, NY, USA 3Department of Informatics, University of Hamburg, Hamburg, Germany
Pseudocode No The paper describes the model architecture and updating schemes using diagrams (Figure 1, Figure 3), but it does not provide any pseudocode or algorithm blocks.
Open Source Code Yes The Pytorch implementation of the models is publicly available . It is based on the code of Su Do RM-RF . This project is MIT Licensed. (Footnote links to https://cslikai.cn/project/AFRCNN)
Open Datasets Yes Libri2Mix [2]. This dataset was constructed using train-100, train-360, dev, and test set in the Libri Speech dataset [25]. ... WSJ0-2Mix [7]. This dataset contains a 30-hour training set, a 10-hour validation set and a 5-hour test set. ... WHAM! [37]. WHAM! added noise to WSJ0-2Mix
Dataset Splits Yes WSJ0-2Mix [7]. This dataset contains a 30-hour training set, a 10-hour validation set and a 5-hour test set.
Hardware Specification Yes All experiments were conducted on a server with Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz and Ge Force RTX 1080 Ti 11G 8.
Software Dependencies No The paper mentions 'The Pytorch implementation of the models is publicly available', but it does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes We trained all models for 200 epochs on 3-second utterances for Libri2Mix and 4-second utterances for WHAM! and WSJ0-2Mix with 8K Hz sampling rate. Batch size was set to 8. The initial learning rate of Adam optimizer was 1 10 3, and it decayed to 1/3 of the previous rate every 40 epochs. During training, gradient clipping with a maximum l2-norm of 5 was used.