reproducibilityindex.ai

Neural Synthesis of Binaural Speech From Mono Audio

Authors: Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an empirical evaluation, we establish that our approach is the first to generate spatially accurate waveform outputs (as measured by real recordings) and outperforms existing approaches by a considerable margin, both quantitatively and in a perceptual study.
Researcher Affiliation	Industry	Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Butler, Fernando de la Torre, Yaser Sheikh Facebook Reality Labs Pittsburgh, USA {richardalex,dejanmarkovic,idgebru,stevenkrenn,gsbutler,yaser}@fb.com
Pseudocode	No	The paper describes the model architecture and processes using text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Dataset and code are available online.1 1https://github.com/facebookresearch/Binaural Speech Synthesis
Open Datasets	Yes	Dataset and code are available online.1 1https://github.com/facebookresearch/Binaural Speech Synthesis
Dataset Splits	Yes	We use a validation sequence and the last two minutes from each participant as test data and train the models on the remaining data.
Hardware Specification	Yes	On a single NVidia Tesla V100, our approach can binauralize 100 seconds of mono audio in just 6.9 seconds.
Software Dependencies	No	The paper mentions a 'pytorch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The temporal convolutional network consists of three sequential blocks. Each block is a stack of ten hyperconvolution layers with 64 channels, kernel size 2, and the dilation size is doubled after each layer. We train our models for 100 epochs using an Adam optimizer. Learning rates are decreased if between two epochs the loss on the training set did not improve.