reproducibilityindex.ai

Interactive Speech and Noise Modeling for Speech Enhancement

Authors: Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu14549-14557

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on public datasets show that the interaction module plays a key role in simultaneous modeling and the SN-Net outperforms the state-of-the-art by a large margin on various evaluation metrics. The proposed SN-Net also shows superior performance for speaker separation.
Researcher Affiliation	Collaboration	1 Communication University of China 2 Microsoft Research Asia 3 Microsoft Corporation
Pseudocode	No	The paper describes network structures and processes in text and diagrams, but it does not include formal pseudocode or an algorithm block.
Open Source Code	No	The paper provides links to code for baseline methods (e.g., DTLN and Conv-Tas Net) in footnotes, but does not state that the code for their proposed SN-Net is open-source or provide a link to it.
Open Datasets	Yes	Three public datasets are used in our experiments. DNS Challenge (Reddy et al. 2020) The DNS challenge (Reddy et al. 2020) at Interspeech 2020 provides a large dataset for training. Voice Bank + DEMAND This is a small dataset created by Valentini-Botinhao et al. (Valentini-Botinhao et al. 2016). TIMIT Corpus This dataset is used for our speaker separation experiment.
Dataset Splits	No	The paper specifies details for training and test sets but does not explicitly describe a separate validation dataset split with specific percentages or counts.
Hardware Specification	No	The paper mentions that the algorithm is implemented in TensorFlow but does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies	No	The proposed algorithm is implemented in Tensor Flow. However, no specific version numbers for TensorFlow or any other software libraries are provided.
Experiment Setup	Yes	We use adam optimizer with a learning rate of 0.0002. All the layers are initialized with Xavier initialization. The training is conducted in two stages. The speech and noise branches are jointly trained ﬁrst with the loss weight α = 1 and β = 0. Then the merge branch is trained with the parameters of previous two ﬁxed, using only the loss LMerge. We train both stages for 60 epochs for DNS Challenge and 400 epochs for Voice Bank + DEMAND dataset. The batch size for all experiments is set to 32, unless otherwise speciﬁed.