Neural Synthesis of Binaural Speech From Mono Audio

Authors: Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an empirical evaluation, we establish that our approach is the first to generate spatially accurate waveform outputs (as measured by real recordings) and outperforms existing approaches by a considerable margin, both quantitatively and in a perceptual study.
Researcher Affiliation Industry Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Butler, Fernando de la Torre, Yaser Sheikh Facebook Reality Labs Pittsburgh, USA {richardalex,dejanmarkovic,idgebru,stevenkrenn,gsbutler,yaser}@fb.com
Pseudocode No The paper describes the model architecture and processes using text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Dataset and code are available online.1 1https://github.com/facebookresearch/Binaural Speech Synthesis
Open Datasets Yes Dataset and code are available online.1 1https://github.com/facebookresearch/Binaural Speech Synthesis
Dataset Splits Yes We use a validation sequence and the last two minutes from each participant as test data and train the models on the remaining data.
Hardware Specification Yes On a single NVidia Tesla V100, our approach can binauralize 100 seconds of mono audio in just 6.9 seconds.
Software Dependencies No The paper mentions a 'pytorch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The temporal convolutional network consists of three sequential blocks. Each block is a stack of ten hyperconvolution layers with 64 channels, kernel size 2, and the dilation size is doubled after each layer. We train our models for 100 epochs using an Adam optimizer. Learning rates are decreased if between two epochs the loss on the training set did not improve.