FIGARO: Controllable Music Generation using Learned and Expert Features
Authors: Dimitri von Rütte, Luca Biggio, Yannic Kilcher, Thomas Hofmann
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FIGARO on its ability to adhere to the prescribed condition by comparing it to state-of-the-art methods for controllable symbolic music generation (Choi et al., 2020; Wu & Yang, 2021). We demonstrate empirically that our technique outperforms the state-of-the-art in controllable generation and sample quality. |
| Researcher Affiliation | Academia | Dimitri von R utte , Luca Biggio , Yannic Kilcher, Thomas Hofmann Department of Computer Science, ETH Z urich dimitri.vonrutte@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Expert Description |
| Open Source Code | Yes | We also release the source code and model weights for anyone to download and use freely.2 Our secondary contribution is REMI+, an extension to the REMI input representation (Huang & Yang, 2020) which opens the way to multi-track, multi-time-signature music. 2Code and model weights are available through Git Hub (https://github.com/dvruette/figaro). |
| Open Datasets | Yes | We use the Lakh MIDI dataset (Raffel, 2016) as training data in all of our experiments, which to the best of our knowledge is the largest publicly available symbolic music dataset. |
| Dataset Splits | Yes | We use a 80%-10%-10% training-validation-test split. |
| Hardware Specification | Yes | Each model is trained for 24 hours on 4 Nvidia GTX 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer" and implies the use of deep learning frameworks, but it does not specify version numbers for key software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | We train each model for 100k steps with a batch size of 512 sequences. Models are optimized using the Adam optimizer (Kingma & Ba, 2017) with β1 = 0.9, β2 = 0.999, ϵ = 10 6 and 0.01 weight decay. We use the inverse-square-root learning rate schedule with initial constant warmup at 10 4 given by 10 4/ max(1, p n/N) where N = 4000 is the number of warmup steps. |