Deep Residual-Dense Lattice Network for Speech Enhancement
Authors: Mohammad Nikzad, Aaron Nicolson, Yongsheng Gao, Jun Zhou, Kuldip K. Paliwal, Fanhua Shang8552-8559
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement. |
| Researcher Affiliation | Academia | Institute for Integrated and Intelligent Systems, Grifο¬th University, Australia School of Artiο¬cial Intelligence, Xidian University, China |
| Pseudocode | No | The paper describes the network architecture and operations using mathematical equations and descriptive text, e.g., "The input to a convolutional unit in the left triangle of the lattice, π₯ βπ, is the dense aggregation of the outputs at length π 1, and heights β, β 1, ..., 1:" and provides equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Availability: https://github.com/nick-nikzad/RDL-SE. |
| Open Datasets | Yes | The train-clean-100 set from the Librispeech corpus (Panayotov et al. 2015), the CSTR VCTK corpus (recordings from speakers π232 and π257 were excluded as they are used in Test Set 2) (Veaux et al. 2017), and the π π and π π₯ training sets from the TIMIT corpus (Garofolo et al. 1993) were included in the training set (73 404 clean speech recordings). |
| Dataset Splits | Yes | 5% of the clean speech recordings (3 667) were randomly selected and used as the validation set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. It mentions training various network architectures but provides no details on specific GPU/CPU models or other hardware specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. It mentions using "The Adam algorithm" for optimization but does not specify the software framework (e.g., TensorFlow, PyTorch) or its version. |
| Experiment Setup | Yes | Cross-entropy was used as the loss function. The Adam algorithm (Kingma and Ba 2014) with default hyper-parameters was used for stochastic gradient descent optimisation. A mini-batch size of 10 noisy speech signals was used. ... A total of 100 epochs were use to train all CNN architectures. A total of 10 epochs were used for the Res LSTM networks and the LSTM-IRM estimator (Chen and Wang 2017)... The Hamming window function was used for analysis and synthesis, with a frame length of 32 ms and a frame shift of 16 ms. |