Differentiable Dynamic Programming for Structured Prediction and Attention

Authors: Arthur Mensch, Mathieu Blondel

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation. ... We measure the performance of the different losses and regularizations on the four languages of the Co NLL 2003 dataset. Results are reported in Table 1, along with reference results with different pretrained embeddings. ... We perform a leave-one-out cross-validation of our model performance, learning the multinomial classifier on 9 pieces and assessing the quality of the alignment on the remaining piece. ... Experiments. We demonstrate structured attention layers with an LSTM encoder and decoder to perform French to English translation...
Researcher Affiliation Collaboration 1Inria, CEA, Universit e Paris-Saclay, Gif-sur-Yvette, France. Work performed at 2NTT Communication Science Laboratories, Kyoto, Japan.
Pseudocode Yes Pseudo-code is summarized in A.5. ... Pseudo-code for VitΩ(θ), as well as gradient and Hessian-product computations, is provided in B.2. ... Pseudo-code to compute DTWΩ(θ) as well as its gradient and its Hessian products are provided in B.3.
Open Source Code Yes We have released an optimized and modular Py Torch implementation for reproduction and reuse.
Open Datasets Yes We measure the performance of the different losses and regularizations on the four languages of the Co NLL 2003 dataset. ... We use our framework to perform supervised audio-to-score alignment on the Bach 10 dataset (Duan & Pardo, 2011).
Dataset Splits Yes We perform a leave-one-out cross-validation of our model performance, learning the multinomial classifier on 9 pieces and assessing the quality of the alignment on the remaining piece.
Hardware Specification No AM thanks Julien Mairal, Inria Thoth and Inria Parietal for lending him the computational resources necessary to run the experiments. However, no specific hardware details (e.g., GPU/CPU models, memory) are provided.
Software Dependencies No We have released an optimized and modular Py Torch implementation for reproduction and reuse. However, a specific version number for PyTorch or other software dependencies is not provided.
Experiment Setup Yes Architecture details are provided in C.1. ... We set the cost between an audio frame and a key to be the loglikelihood of this key given a multinomial linear classifier: i [NA], li log(softmax(W ai + c)) RK and j [NB], θi,j li,bj, where (W , c) RD K RK are learned classifier parameters. ... We demonstrate structured attention layers with an LSTM encoder and decoder...