Music Transformer: Generating Music with Long-Term Structure

Authors: Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Maestro, and obtain state-of-the-art results on the latter.
Researcher Affiliation Industry Google Brain {annahuang,avaswani,usz,noam, iansimon,fjord,adai,mhoffman,noms,deck}@google.com
Pseudocode No The paper describes procedures with numbered steps and figures (Figure 1 and Figure 2) but does not contain a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code No The paper refers to an existing framework ("Tensor2Tensor framework") and provides links to listening samples and a blog post, but does not explicitly state that the source code for their specific methodology is released or provide a link to it.
Open Datasets Yes JSB Chorales dataset: https://github.com/czhuang/JSB-Chorales-dataset and Maestro dataset: urlhttps://magenta.tensorflow.org/datasets/maestro
Dataset Splits Yes The result is 295 / 60 / 75 unique compositions, corresponding to 954 / 105 / 125 performances that last for 140 / 15 / 17 hours and contain 5.06 / 0.54 / 0.57 million notes.
Hardware Specification No The paper mentions "allowing us to use GPUs to train the relative self-attention Transformer on long sequences" and refers to "a GPU with 16GB memory" in Table 1, but does not specify exact GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies No The paper states "We implemented our attention mechanisms in the Tensor2Tensor framework (Vaswani et al., 2018)" but does not specify a version number for this framework or any other software dependencies.
Experiment Setup Yes For the Transformer models (abbreviated as TF), we implemented our attention mechanisms in the Tensor2Tensor framework (Vaswani et al., 2018). We use 8 heads, and keep the query, key (att) and value hidden size (hs) fixed within a config. We tuned number of layers (L in {4,5,6}), attention hidden size (att in {256, 512}) and pointwise feedforward hidden size (ff in {512, 1024}). We implemented our attention mechanisms in the Tensor2Tensor framework (Vaswani et al., 2018), and used the default hyperparameters for training, with 0.1 learning rate and early stopping.