Unsupervised Sound Separation Using Mixture Invariant Training
Authors: Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron Weiss, Kevin Wilson, John Hershey
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments, Separation performance is measured using scale-invariant signal-to-noise ratio (SI-SNR) [25]. Results on single anechoic and reverberant 2-source mixtures are shown in Figure 2, and results on single-source inputs are in Appendix E. |
| Researcher Affiliation | Collaboration | Scott Wisdom Google Research scottwisdom@google.com Efthymios Tzinis UIUC etzinis2@illinois.edu Hakan Erdogan Google Research hakanerdogan@google.com Ron J. Weiss Google Research ronw@google.com Kevin Wilson Google Research kwwilson@google.com John R. Hershey Google Research johnhershey@google.com |
| Pseudocode | No | The paper describes the method using text and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Mix IT code on Git Hub. https://github.com/google-research/sound-separation/tree/master/models/neurips2020_mixit. |
| Open Datasets | Yes | For speech separation experiments, we use the WSJ0-2mix [17] and Libri2Mix [9] datasets, sampled at 8 k Hz and 16 k Hz. We also employ the reverberant spatialized version of WSJ0-2mix [44] and a reverberant version of Libri2Mix we created... For our experiments, we use the recently released Free Universal Sound Separation (FUSS) dataset [47, 48] |
| Dataset Splits | No | For training, 3 second clips are used for WSJ0-2mix, and 10 second clips for Libri2Mix. Evaluation always uses single mixtures of two sources. On a held out test set, the supervised model achieves 15.0 d B SI-SNRi for speech, and the unsupervised Mix IT model achieves 11.4 d B SI-SNRi. The paper discusses training and testing data usage, but does not specify validation splits or exact percentages/counts for train/test/validation splits for reproduction. |
| Hardware Specification | Yes | All models are trained on 4 Google Cloud TPUs (16 chips) with the Adam optimizer [22], a batch size of 256, and learning rate of 10 3. |
| Software Dependencies | No | The paper mentions software components like |
| Experiment Setup | Yes | All models are trained on 4 Google Cloud TPUs (16 chips) with the Adam optimizer [22], a batch size of 256, and learning rate of 10 3. |