Neural Synthesis of Binaural Speech From Mono Audio
Authors: Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an empirical evaluation, we establish that our approach is the first to generate spatially accurate waveform outputs (as measured by real recordings) and outperforms existing approaches by a considerable margin, both quantitatively and in a perceptual study. |
| Researcher Affiliation | Industry | Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Butler, Fernando de la Torre, Yaser Sheikh Facebook Reality Labs Pittsburgh, USA {richardalex,dejanmarkovic,idgebru,stevenkrenn,gsbutler,yaser}@fb.com |
| Pseudocode | No | The paper describes the model architecture and processes using text and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Dataset and code are available online.1 1https://github.com/facebookresearch/Binaural Speech Synthesis |
| Open Datasets | Yes | Dataset and code are available online.1 1https://github.com/facebookresearch/Binaural Speech Synthesis |
| Dataset Splits | Yes | We use a validation sequence and the last two minutes from each participant as test data and train the models on the remaining data. |
| Hardware Specification | Yes | On a single NVidia Tesla V100, our approach can binauralize 100 seconds of mono audio in just 6.9 seconds. |
| Software Dependencies | No | The paper mentions a 'pytorch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The temporal convolutional network consists of three sequential blocks. Each block is a stack of ten hyperconvolution layers with 64 channels, kernel size 2, and the dilation size is doubled after each layer. We train our models for 100 epochs using an Adam optimizer. Learning rates are decreased if between two epochs the loss on the training set did not improve. |