Learning to Groove with Inverse Sequence Transformations

Authors: Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, David Bamman

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using Seq2Seq and recurrent variational Information Bottleneck (VIB) models. Though Seq2Seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g., Pix2Pix (Isola et al., 2017) and Vid2Vid (Wang et al., 2018a)) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and models for learning to invert them have real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with finegrained timing and dynamics information. We also explore some of the creative potential of these models, including demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).
Researcher Affiliation Collaboration 1School of Information, University of California, Berkeley, CA, U.S.A 2Google AI, Mountain View, CA, U.S.A.
Pseudocode No The paper includes architectural diagrams (Figure 3 and Figure 4) but no explicit pseudocode blocks or algorithms.
Open Source Code Yes Code, data, trained models, and audio examples are available at https://g.co/magenta/groovae.
Open Datasets Yes The dataset, which we refer to as the Groove MIDI Dataset (GMD), is publicly available for download at https://magenta.tensorflow.org/datasets/groove.
Dataset Splits Yes After partitioning recorded sequences into training, development, and test sets, we slide fixed size windows across all full sequences to create drum patterns of fixed length... A train/validation/test split configuration is provided for easier comparison of model accuracy on various tasks.
Hardware Specification No The paper mentions using a "Roland TD-11 electronic drum kit" for data collection, but does not specify any hardware (CPU, GPU, memory) used for training or running the experiments.
Software Dependencies No We train all our neural models with Tensorflow (Abadi et al., 2016) and the Adam optimizer (Kingma & Ba, 2014). The paper mentions TensorFlow and Adam optimizer but does not specify their version numbers.
Experiment Setup No The paper does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations for the neural models. It mentions "a single hidden layer of size 256 and Re LU nonlinearities" for MLP and "LSTM layer dimensions from 2048 to 512 and the dimension of z from 512 to 256" for Seq2Seq, but these are architectural details, not specific training setup parameters.