reproducibilityindex.ai

Deep Residual-Dense Lattice Network for Speech Enhancement

Authors: Mohammad Nikzad, Aaron Nicolson, Yongsheng Gao, Jun Zhou, Kuldip K. Paliwal, Fanhua Shang8552-8559

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement.
Researcher Affiliation	Academia	Institute for Integrated and Intelligent Systems, Grifﬁth University, Australia School of Artiﬁcial Intelligence, Xidian University, China
Pseudocode	No	The paper describes the network architecture and operations using mathematical equations and descriptive text, e.g., "The input to a convolutional unit in the left triangle of the lattice, 𝑥 ℎ𝑙, is the dense aggregation of the outputs at length 𝑙 1, and heights ℎ, ℎ 1, ..., 1:" and provides equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Availability: https://github.com/nick-nikzad/RDL-SE.
Open Datasets	Yes	The train-clean-100 set from the Librispeech corpus (Panayotov et al. 2015), the CSTR VCTK corpus (recordings from speakers 𝑝232 and 𝑝257 were excluded as they are used in Test Set 2) (Veaux et al. 2017), and the 𝑠𝑖 and 𝑠𝑥 training sets from the TIMIT corpus (Garofolo et al. 1993) were included in the training set (73 404 clean speech recordings).
Dataset Splits	Yes	5% of the clean speech recordings (3 667) were randomly selected and used as the validation set.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It mentions training various network architectures but provides no details on specific GPU/CPU models or other hardware specifications.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It mentions using "The Adam algorithm" for optimization but does not specify the software framework (e.g., TensorFlow, PyTorch) or its version.
Experiment Setup	Yes	Cross-entropy was used as the loss function. The Adam algorithm (Kingma and Ba 2014) with default hyper-parameters was used for stochastic gradient descent optimisation. A mini-batch size of 10 noisy speech signals was used. ... A total of 100 epochs were use to train all CNN architectures. A total of 10 epochs were used for the Res LSTM networks and the LSTM-IRM estimator (Chen and Wang 2017)... The Hamming window function was used for analysis and synthesis, with a frame length of 32 ms and a frame shift of 16 ms.