Deep Residual-Dense Lattice Network for Speech Enhancement

Authors: Mohammad Nikzad, Aaron Nicolson, Yongsheng Gao, Jun Zhou, Kuldip K. Paliwal, Fanhua Shang8552-8559

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement.
Researcher Affiliation Academia Institute for Integrated and Intelligent Systems, Griffith University, Australia School of Artificial Intelligence, Xidian University, China
Pseudocode No The paper describes the network architecture and operations using mathematical equations and descriptive text, e.g., "The input to a convolutional unit in the left triangle of the lattice, π‘₯ β„Žπ‘™, is the dense aggregation of the outputs at length 𝑙 1, and heights β„Ž, β„Ž 1, ..., 1:" and provides equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Availability: https://github.com/nick-nikzad/RDL-SE.
Open Datasets Yes The train-clean-100 set from the Librispeech corpus (Panayotov et al. 2015), the CSTR VCTK corpus (recordings from speakers 𝑝232 and 𝑝257 were excluded as they are used in Test Set 2) (Veaux et al. 2017), and the 𝑠𝑖 and 𝑠π‘₯ training sets from the TIMIT corpus (Garofolo et al. 1993) were included in the training set (73 404 clean speech recordings).
Dataset Splits Yes 5% of the clean speech recordings (3 667) were randomly selected and used as the validation set.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments. It mentions training various network architectures but provides no details on specific GPU/CPU models or other hardware specifications.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions using "The Adam algorithm" for optimization but does not specify the software framework (e.g., TensorFlow, PyTorch) or its version.
Experiment Setup Yes Cross-entropy was used as the loss function. The Adam algorithm (Kingma and Ba 2014) with default hyper-parameters was used for stochastic gradient descent optimisation. A mini-batch size of 10 noisy speech signals was used. ... A total of 100 epochs were use to train all CNN architectures. A total of 10 epochs were used for the Res LSTM networks and the LSTM-IRM estimator (Chen and Wang 2017)... The Hamming window function was used for analysis and synthesis, with a frame length of 32 ms and a frame shift of 16 ms.