Audeo: Audio Generation for a Silent Performance Video
Authors: Kun Su, Xiulong Liu, Eli Shlizerman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Audeo on piano performance videos collected from You Tube and obtain that their generated music is of reasonable audio quality and can be successfully recognized with high precision by popular music identification software. For the Midi Evaluation, we evaluate predictions from our Video2Roll Net and Roll2Midi Net by reporting the precision, the recall, the accuracy, and the F1 score on the frame-level defined in [50]. |
| Researcher Affiliation | Academia | Kun Su Xiulong Liu Eli Shlizerman Department of Electrical & Computer Engineering, University of Washington, Seattle, USA Department of Applied Mathematics, University of Washington, Seattle, USA |
| Pseudocode | No | The paper describes the system components and their functions but does not include any pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | Yes | The source code with examples is available in a Github repository 3. |
| Open Datasets | Yes | We evaluate Audeo on piano performance videos collected from You Tube. The minor constraint for data collection is top view piano performance with a fully visible keyboard. Indeed, the instrument and the camera setup are not required to be the same for the recordings that we use. Particularly, we use videos recorded by Paul Barton4 at the frame rate of 25fps and the audio sampling rate of 16k Hz. The Pseudo GT Midi are obtained via Onsets and Frames framework (OF) [31]. |
| Dataset Splits | No | The paper describes training and testing sets with specific sizes ('172, 404 training images and 18, 788 testing images') but does not explicitly mention a separate validation dataset split. |
| Hardware Specification | Yes | Two Nvidia Titan X GPUs are used to train all components in Audeo. |
| Software Dependencies | No | The paper mentions PyTorch and Fluid Synth but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train the network using binary cross-entropy loss with a batch size of 64. ... We use the Mean Square Error (MSE) to optimize both the generator and the discriminator. ... The Perf Net is pretrained with Pseudo GT Midi using MSE loss with a batch size of 16. The log-scaled spectrogram with 2, 048 window size and 256 hop size. ... Adam optimizer [53] with β1 = 0.9, β2 = 0.999. For all models, we use the learning rate starting from 0.001. |