reproducibilityindex.ai

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

Authors: Daniel Stoller, Mi Tian, Sebastian Ewert, Simon Dixon

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our model ( Seq-U-Net ) to a variety of tasks including language and audio generation. In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance on real-world tasks.
Researcher Affiliation	Collaboration	1Queen Mary University of London 2Spotify
Pseudocode	No	The paper does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://github.com/f90/Seq-U-Net
Open Datasets	Yes	We perform character-based language modelling, where the task is to predict the next character given a history of previously observed ones, on the PTB dataset [Marcus et al., 1993]. ... For our second experiment, we perform word-based language modelling, which involves predicting the next word following a given sequence of words. As in the previous experiment, we use the PTB dataset with a vocabulary of 10,000 words. ... To test whether our model can capture long-term dependencies found in complex real-world sequences, we apply it to the generation of audio waveforms, using the residual variant presented in Section 3.2. ... In particular, we use the classical piano recordings as used by Dieleman et al. [2018] amounting to about 607 hours in duration, and partition them into a training and test set, while avoiding pieces overlapping between the two partitions.
Dataset Splits	Yes	We optimise each model for 100 epochs using a batch size of 16 and an Adam optimiser with initial learning rate α, which is reduced by half if validation performance did not improve after P epochs and more than 10 epochs have passed since the beginning of training. Finally, the model that performed best on the validation set is selected. To prevent the training procedure from favouring one model over the other, we perform a hyper-parameter optimisation over the learning rate α [e 12, e 2] and optional gradient clipping with magnitudes between [0.01, 1.0]. This hyper-parameter optimisation is performed for each combination of model and task using a tree of Parzen estimators7 to ﬁnd the minimum validation loss.
Hardware Specification	Yes	For time and memory measurements, we use a single NVIDIA GTX 1080 GPU with Pytorch 1.2, CUDA 9 and cu DNN 7.55.
Software Dependencies	Yes	For time and memory measurements, we use a single NVIDIA GTX 1080 GPU with Pytorch 1.2, CUDA 9 and cu DNN 7.55.
Experiment Setup	Yes	We optimise each model for 100 epochs using a batch size of 16 and an Adam optimiser with initial learning rate α, which is reduced by half if validation performance did not improve after P epochs and more than 10 epochs have passed since the beginning of training. Finally, the model that performed best on the validation set is selected. To prevent the training procedure from favouring one model over the other, we perform a hyper-parameter optimisation over the learning rate α [e 12, e 2] and optional gradient clipping with magnitudes between [0.01, 1.0]. This hyper-parameter optimisation is performed for each combination of model and task using a tree of Parzen estimators7 to ﬁnd the minimum validation loss. All results are shown in Table 1, using the hyper-parameters shown in Table 2.