Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths

Authors: Ming Xu, Sourav Garg, Michael Milford, Stephen Gould

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Dec DTW on two such applications, namely the audio-to-score alignment task in music information retrieval and the visual place recognition task in robotics, demonstrating state-of-the-art results in both.Our experiments involve two distinct application areas and utilise real-world datasets. For both applications, the goal is to use learning to improve the accuracy of predicted temporal alignments using a training set of labelled ground truth alignments. We will show that Dec DTW yields state-of-the-art performance on these tasks.
Researcher Affiliation Academia Ming Xu1,2 Sourav Garg1 Michael Milford1 Stephen Gould2 1Queensland University of Technology 2Australian National University
Pseudocode No While the paper describes the dynamic programming approach in text, it does not provide a formal pseudocode or algorithm block.
Open Source Code Yes We have implemented Dec DTW in PyTorch (Paszke et al., 2019) with open source code available to reproduce all experiments at https://github.com/mingu6/declarativedtw.git.
Open Datasets Yes We use the dataset in Thickstun et al. (2020), which is comprised of 193 piano performances of 64 scores with ground truth alignments taken from a subset of the MAESTRO (Hawthorne et al., 2019) and Kernscores (Sapp, 2005) datasets.We source images and geotags from the Oxford Robot Car dataset (Maddern et al., 2017)...We provide our paired sequence dataset in the supplementary material.
Dataset Splits Yes These 193 performances are divided into train (94), validation (49) and test (50) splits.In total, we use 22k, 4k, 1.7k sequence pairs for training, validation and testing, respectively.
Hardware Specification Yes Timings are evaluated on an Nvidia Ge Force GTX 1080Ti 11Gb.
Software Dependencies No The paper mentions PyTorch and the SciPy library (via a citation to version 1.0), but does not list specific version numbers for all key software components (e.g., Python, CUDA, other libraries) that would be needed for a reproducible setup.
Experiment Setup Yes We use the Adam (Kingma & Ba, 2015) optimiser with a learning rate of 0.0001 and a batch size of 5 for all methods. We trained for 300 epochs for DILATE and 20 epochs for all remaining methods and selected the model which yielded minimum Time Err over the validation set. For Dec DTW and Base (G), we set GDTW the regularisation hyperparameter λ = 0.1, 0.7, 0.9 for CQT, chroma and melspec features, respectively.For the sequence fine-tuning experiments, we use a learning rate of 0.0001, batch size of 8 for a maximum of 10 epochs across all methods.