Interactive Speech and Noise Modeling for Speech Enhancement
Authors: Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu14549-14557
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on public datasets show that the interaction module plays a key role in simultaneous modeling and the SN-Net outperforms the state-of-the-art by a large margin on various evaluation metrics. The proposed SN-Net also shows superior performance for speaker separation. |
| Researcher Affiliation | Collaboration | 1 Communication University of China 2 Microsoft Research Asia 3 Microsoft Corporation |
| Pseudocode | No | The paper describes network structures and processes in text and diagrams, but it does not include formal pseudocode or an algorithm block. |
| Open Source Code | No | The paper provides links to code for baseline methods (e.g., DTLN and Conv-Tas Net) in footnotes, but does not state that the code for their proposed SN-Net is open-source or provide a link to it. |
| Open Datasets | Yes | Three public datasets are used in our experiments. DNS Challenge (Reddy et al. 2020) The DNS challenge (Reddy et al. 2020) at Interspeech 2020 provides a large dataset for training. Voice Bank + DEMAND This is a small dataset created by Valentini-Botinhao et al. (Valentini-Botinhao et al. 2016). TIMIT Corpus This dataset is used for our speaker separation experiment. |
| Dataset Splits | No | The paper specifies details for training and test sets but does not explicitly describe a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | No | The paper mentions that the algorithm is implemented in TensorFlow but does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The proposed algorithm is implemented in Tensor Flow. However, no specific version numbers for TensorFlow or any other software libraries are provided. |
| Experiment Setup | Yes | We use adam optimizer with a learning rate of 0.0002. All the layers are initialized with Xavier initialization. The training is conducted in two stages. The speech and noise branches are jointly trained first with the loss weight α = 1 and β = 0. Then the merge branch is trained with the parameters of previous two fixed, using only the loss LMerge. We train both stages for 60 epochs for DNS Challenge and 400 epochs for Voice Bank + DEMAND dataset. The batch size for all experiments is set to 32, unless otherwise specified. |