Music Transformer: Generating Music with Long-Term Structure
Authors: Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Maestro, and obtain state-of-the-art results on the latter. |
| Researcher Affiliation | Industry | Google Brain {annahuang,avaswani,usz,noam, iansimon,fjord,adai,mhoffman,noms,deck}@google.com |
| Pseudocode | No | The paper describes procedures with numbered steps and figures (Figure 1 and Figure 2) but does not contain a clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | No | The paper refers to an existing framework ("Tensor2Tensor framework") and provides links to listening samples and a blog post, but does not explicitly state that the source code for their specific methodology is released or provide a link to it. |
| Open Datasets | Yes | JSB Chorales dataset: https://github.com/czhuang/JSB-Chorales-dataset and Maestro dataset: urlhttps://magenta.tensorflow.org/datasets/maestro |
| Dataset Splits | Yes | The result is 295 / 60 / 75 unique compositions, corresponding to 954 / 105 / 125 performances that last for 140 / 15 / 17 hours and contain 5.06 / 0.54 / 0.57 million notes. |
| Hardware Specification | No | The paper mentions "allowing us to use GPUs to train the relative self-attention Transformer on long sequences" and refers to "a GPU with 16GB memory" in Table 1, but does not specify exact GPU models, CPU types, or other detailed hardware specifications. |
| Software Dependencies | No | The paper states "We implemented our attention mechanisms in the Tensor2Tensor framework (Vaswani et al., 2018)" but does not specify a version number for this framework or any other software dependencies. |
| Experiment Setup | Yes | For the Transformer models (abbreviated as TF), we implemented our attention mechanisms in the Tensor2Tensor framework (Vaswani et al., 2018). We use 8 heads, and keep the query, key (att) and value hidden size (hs) fixed within a config. We tuned number of layers (L in {4,5,6}), attention hidden size (att in {256, 512}) and pointwise feedforward hidden size (ff in {512, 1024}). We implemented our attention mechanisms in the Tensor2Tensor framework (Vaswani et al., 2018), and used the default hyperparameters for training, with 0.1 learning rate and early stopping. |