Unsupervised Sound Separation Using Mixture Invariant Training

Authors: Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron Weiss, Kevin Wilson, John Hershey

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments, Separation performance is measured using scale-invariant signal-to-noise ratio (SI-SNR) [25]. Results on single anechoic and reverberant 2-source mixtures are shown in Figure 2, and results on single-source inputs are in Appendix E.
Researcher Affiliation Collaboration Scott Wisdom Google Research scottwisdom@google.com Efthymios Tzinis UIUC etzinis2@illinois.edu Hakan Erdogan Google Research hakanerdogan@google.com Ron J. Weiss Google Research ronw@google.com Kevin Wilson Google Research kwwilson@google.com John R. Hershey Google Research johnhershey@google.com
Pseudocode No The paper describes the method using text and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Mix IT code on Git Hub. https://github.com/google-research/sound-separation/tree/master/models/neurips2020_mixit.
Open Datasets Yes For speech separation experiments, we use the WSJ0-2mix [17] and Libri2Mix [9] datasets, sampled at 8 k Hz and 16 k Hz. We also employ the reverberant spatialized version of WSJ0-2mix [44] and a reverberant version of Libri2Mix we created... For our experiments, we use the recently released Free Universal Sound Separation (FUSS) dataset [47, 48]
Dataset Splits No For training, 3 second clips are used for WSJ0-2mix, and 10 second clips for Libri2Mix. Evaluation always uses single mixtures of two sources. On a held out test set, the supervised model achieves 15.0 d B SI-SNRi for speech, and the unsupervised Mix IT model achieves 11.4 d B SI-SNRi. The paper discusses training and testing data usage, but does not specify validation splits or exact percentages/counts for train/test/validation splits for reproduction.
Hardware Specification Yes All models are trained on 4 Google Cloud TPUs (16 chips) with the Adam optimizer [22], a batch size of 256, and learning rate of 10 3.
Software Dependencies No The paper mentions software components like
Experiment Setup Yes All models are trained on 4 Google Cloud TPUs (16 chips) with the Adam optimizer [22], a batch size of 256, and learning rate of 10 3.