Symbolic Music Generation with Transformer-GANs

Authors: Aashiq Muhamed, Liang Li, Xingjian Shi, Suri Yaddanapudi, Wayne Chi, Dylan Jackson, Rahul Suresh, Zachary C. Lipton, Alex J. Smola408-417

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate via human evaluations and a new discriminative metric that the music generated by our approach outperforms a baseline trained with likelihood maximization, the state-of-the-art Music Transformer, and other GANs used for sequence generation. 57% of people prefer music generated via our approach while 43% prefer Music Transformer.
Researcher Affiliation Collaboration Aashiq Muhamed1 , Liang Li1 , Xingjian Shi1, Suri Yaddanapudi1, Wayne Chi1, Dylan Jackson1, Rahul Suresh1, Zachary C. Lipton2, Alexander J. Smola1 1 Amazon Web Services 2 Carnegie Mellon University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology. It only provides a link to audio samples: "Samples can be found at https://tinyurl.com/y6awtlv7.".
Open Datasets Yes We benchmark our models on the MAESTRO MIDI V1 dataset (Hawthorne et al. 2019), which contains over 200 hours of paired audio and MIDI recordings from ten years of the International Piano-e-Competition.
Dataset Splits Yes The dataset is split into 80/10/10 for training/validation/evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments (e.g., specific GPU/CPU models, processor types, or memory details). It mentions
Software Dependencies No The paper mentions using "Tensor2Tensor (Vaswani et al. 2018)" for Music Transformer but does not specify a version number for this or any other software dependency needed for replication.
Experiment Setup Yes Hyperparameter Configuration The hyperparameters we used for the Transformer-XL architecture are shown in Table 3. For training, we used a 0.004 initial learning rate, the inverse square root scheduler and Adam optimizer. We used a target length of 128 for both training and evaluation, since we found this value offers a reasonable trade-off between training time and performance on metrics. Since TBPTT addresses the memory bottleneck, our framework can train on sequence lengths longer than 128. We set memory length for the Transformer-XL as 1024 during training and 2048 during evaluation. During sample generation, we set memory length equal to the number of tokens to generate. We observed, as in (Dai et al. 2019), that NLL and gen- erated music quality were sensitive to memory length. We introduced a reset memory feature into the Transformer-XL training process as clearing the Transformer-XL memory at the beginning of each new MIDI file. We report the baseline models with the lowest validation NLL. All our GANs and baselines use the same Transformer XL configuration. We set the sequence length of generated samples during adversarial training as 128 (equal to target length). Our GAN generator is initialized using our best NLL-trained baseline model. We follow an alternating training procedure to update the generator and discriminator using the NLL and GAN losses. The NLL loss frequency is five times the GAN loss frequency. We used βmax = 100 in all our experiments as in Nie, Narodytska, and Patel (2019).