reproducibilityindex.ai

Audeo: Audio Generation for a Silent Performance Video

Authors: Kun Su, Xiulong Liu, Eli Shlizerman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Audeo on piano performance videos collected from You Tube and obtain that their generated music is of reasonable audio quality and can be successfully recognized with high precision by popular music identiﬁcation software. For the Midi Evaluation, we evaluate predictions from our Video2Roll Net and Roll2Midi Net by reporting the precision, the recall, the accuracy, and the F1 score on the frame-level deﬁned in [50].
Researcher Affiliation	Academia	Kun Su Xiulong Liu Eli Shlizerman Department of Electrical & Computer Engineering, University of Washington, Seattle, USA Department of Applied Mathematics, University of Washington, Seattle, USA
Pseudocode	No	The paper describes the system components and their functions but does not include any pseudocode or explicitly labeled algorithm blocks.
Open Source Code	Yes	The source code with examples is available in a Github repository 3.
Open Datasets	Yes	We evaluate Audeo on piano performance videos collected from You Tube. The minor constraint for data collection is top view piano performance with a fully visible keyboard. Indeed, the instrument and the camera setup are not required to be the same for the recordings that we use. Particularly, we use videos recorded by Paul Barton4 at the frame rate of 25fps and the audio sampling rate of 16k Hz. The Pseudo GT Midi are obtained via Onsets and Frames framework (OF) [31].
Dataset Splits	No	The paper describes training and testing sets with specific sizes ('172, 404 training images and 18, 788 testing images') but does not explicitly mention a separate validation dataset split.
Hardware Specification	Yes	Two Nvidia Titan X GPUs are used to train all components in Audeo.
Software Dependencies	No	The paper mentions PyTorch and Fluid Synth but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We train the network using binary cross-entropy loss with a batch size of 64. ... We use the Mean Square Error (MSE) to optimize both the generator and the discriminator. ... The Perf Net is pretrained with Pseudo GT Midi using MSE loss with a batch size of 16. The log-scaled spectrogram with 2, 048 window size and 256 hop size. ... Adam optimizer [53] with β1 = 0.9, β2 = 0.999. For all models, we use the learning rate starting from 0.001.