Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
Authors: Daniel Stoller, Mi Tian, Sebastian Ewert, Simon Dixon
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our model ( Seq-U-Net ) to a variety of tasks including language and audio generation. In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance on real-world tasks. |
| Researcher Affiliation | Collaboration | 1Queen Mary University of London 2Spotify |
| Pseudocode | No | The paper does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/f90/Seq-U-Net |
| Open Datasets | Yes | We perform character-based language modelling, where the task is to predict the next character given a history of previously observed ones, on the PTB dataset [Marcus et al., 1993]. ... For our second experiment, we perform word-based language modelling, which involves predicting the next word following a given sequence of words. As in the previous experiment, we use the PTB dataset with a vocabulary of 10,000 words. ... To test whether our model can capture long-term dependencies found in complex real-world sequences, we apply it to the generation of audio waveforms, using the residual variant presented in Section 3.2. ... In particular, we use the classical piano recordings as used by Dieleman et al. [2018] amounting to about 607 hours in duration, and partition them into a training and test set, while avoiding pieces overlapping between the two partitions. |
| Dataset Splits | Yes | We optimise each model for 100 epochs using a batch size of 16 and an Adam optimiser with initial learning rate α, which is reduced by half if validation performance did not improve after P epochs and more than 10 epochs have passed since the beginning of training. Finally, the model that performed best on the validation set is selected. To prevent the training procedure from favouring one model over the other, we perform a hyper-parameter optimisation over the learning rate α [e 12, e 2] and optional gradient clipping with magnitudes between [0.01, 1.0]. This hyper-parameter optimisation is performed for each combination of model and task using a tree of Parzen estimators7 to find the minimum validation loss. |
| Hardware Specification | Yes | For time and memory measurements, we use a single NVIDIA GTX 1080 GPU with Pytorch 1.2, CUDA 9 and cu DNN 7.55. |
| Software Dependencies | Yes | For time and memory measurements, we use a single NVIDIA GTX 1080 GPU with Pytorch 1.2, CUDA 9 and cu DNN 7.55. |
| Experiment Setup | Yes | We optimise each model for 100 epochs using a batch size of 16 and an Adam optimiser with initial learning rate α, which is reduced by half if validation performance did not improve after P epochs and more than 10 epochs have passed since the beginning of training. Finally, the model that performed best on the validation set is selected. To prevent the training procedure from favouring one model over the other, we perform a hyper-parameter optimisation over the learning rate α [e 12, e 2] and optional gradient clipping with magnitudes between [0.01, 1.0]. This hyper-parameter optimisation is performed for each combination of model and task using a tree of Parzen estimators7 to find the minimum validation loss. All results are shown in Table 1, using the hyper-parameters shown in Table 2. |