Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

Authors: Yujia Yan, Frank Cwitkowitz, Zhiyao Duan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the MAESTRO dataset and demonstrate that the proposed model surpasses the current state-of-the-art for piano transcription. Our results suggest that the semi CRF output layer, while still quadratic in complexity, is a simple, fast and wellperforming solution for event-based prediction, and may lead to similar success in other areas which currently rely on frame-level estimates.
Researcher Affiliation Academia Yujia Yan , Frank Cwitkowitz, Zhiyao Duan Department of Electrical and Computer Engineering University of Rochester Rochester, NY, 14627 USA yujia.yan@rochester.edu, fcwitkow@ur.rochester.edu zhiyao.duan@rochester.edu
Pseudocode Yes Algorithm 1 Forward-backward algorithm for log Z and log Z for a specific event type. Algorithm 2 Viterbi (MAP) decoding of a specific event Type within an audio segment.
Open Source Code Yes Code is available at https://github.com/Yujia-Yan/Skipping-The-Frame-Level
Open Datasets Yes We conduct our experiments using the MAESTRO v2 dataset [Hawthorne et al., 2019], which contains around 200 hours of MIDI-synchronized (3ms precision) virtuoso piano performance recordings.
Dataset Splits Yes We conduct our experiments using the MAESTRO v2 dataset [Hawthorne et al., 2019]... We compare the proposed system to the state-of-the-art methods2 , for piano transcription using the MAESTRO v2 test split. We recompute these metrics for other systems directly from the transcribed MIDI files generated by their pretrained models. We also report our results for our model trained and evaluated on the MAESTRO v3 splits for future reference.
Hardware Specification Yes Running time (seconds) of algorithm components that have quadratic time complexity w.r.t. the input length on Intel(R) Core(TM) i7-7800X CPU @ 3.50 GHz and Nvidia GTX 1080TI.
Software Dependencies No The algorithms were implemented in Py Torch, and we believe that further speedup can be achieved with a native C++/CUDA implementation. However, no specific version number for PyTorch or other software dependencies is provided.
Experiment Setup Yes We use a batch size of 12 and Adabelief [Zhuang et al., 2020] optimizer with a weight decay of 1e-4. We use one Cycle [Smith and Topin, 2019] learning rate scheduler with maximum learning rate set to 6e-4 for 180k iterations and cosine annealing. The learning rate is increased gradually for 20% of iterations and then gradually annealed to 1.5e-5. We automatically determine the value for gradient clipping by using the 0.8 quantile of the gradient norm during the last 10k iterations, which is a strategy similar to Seetharaman et al. [2020]. We apply dropout with rate 0.1 on the attribute predictors and the score model.