Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs
Authors: Yujia Yan, Frank Cwitkowitz, Zhiyao Duan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the MAESTRO dataset and demonstrate that the proposed model surpasses the current state-of-the-art for piano transcription. Our results suggest that the semi CRF output layer, while still quadratic in complexity, is a simple, fast and wellperforming solution for event-based prediction, and may lead to similar success in other areas which currently rely on frame-level estimates. |
| Researcher Affiliation | Academia | Yujia Yan , Frank Cwitkowitz, Zhiyao Duan Department of Electrical and Computer Engineering University of Rochester Rochester, NY, 14627 USA yujia.yan@rochester.edu, fcwitkow@ur.rochester.edu zhiyao.duan@rochester.edu |
| Pseudocode | Yes | Algorithm 1 Forward-backward algorithm for log Z and log Z for a specific event type. Algorithm 2 Viterbi (MAP) decoding of a specific event Type within an audio segment. |
| Open Source Code | Yes | Code is available at https://github.com/Yujia-Yan/Skipping-The-Frame-Level |
| Open Datasets | Yes | We conduct our experiments using the MAESTRO v2 dataset [Hawthorne et al., 2019], which contains around 200 hours of MIDI-synchronized (3ms precision) virtuoso piano performance recordings. |
| Dataset Splits | Yes | We conduct our experiments using the MAESTRO v2 dataset [Hawthorne et al., 2019]... We compare the proposed system to the state-of-the-art methods2 , for piano transcription using the MAESTRO v2 test split. We recompute these metrics for other systems directly from the transcribed MIDI files generated by their pretrained models. We also report our results for our model trained and evaluated on the MAESTRO v3 splits for future reference. |
| Hardware Specification | Yes | Running time (seconds) of algorithm components that have quadratic time complexity w.r.t. the input length on Intel(R) Core(TM) i7-7800X CPU @ 3.50 GHz and Nvidia GTX 1080TI. |
| Software Dependencies | No | The algorithms were implemented in Py Torch, and we believe that further speedup can be achieved with a native C++/CUDA implementation. However, no specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | We use a batch size of 12 and Adabelief [Zhuang et al., 2020] optimizer with a weight decay of 1e-4. We use one Cycle [Smith and Topin, 2019] learning rate scheduler with maximum learning rate set to 6e-4 for 180k iterations and cosine annealing. The learning rate is increased gradually for 20% of iterations and then gradually annealed to 1.5e-5. We automatically determine the value for gradient clipping by using the 0.8 quantile of the gradient norm during the last 10k iterations, which is a strategy similar to Seetharaman et al. [2020]. We apply dropout with rate 0.1 on the attribute predictors and the score model. |