Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Authors: Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that it outperforms a strong baseline on character-level translation tasks from WMT 15, the algorithmic task of finding Eulerian circuits of graphs, and question generation from the text. Our analysis demonstrates that the model computes qualitatively intuitive alignments, converges faster than the baselines, and achieves superior performance with fewer parameters.
Researcher Affiliation Collaboration Francis Dutil University of Montreal (MILA) frdutil@gmail.com Caglar Gulcehre University of Montreal (MILA) ca9lar@gmail.com Adam Trischler Microsoft Research Maluuba adam.trischler@microsoft.com Yoshua Bengio University of Montreal (MILA) yoshua.umontreal@gmail.com
Pseudocode Yes Algorithm 1: Pseudocode for updating the alignment plan and commitment vector. Algorithm 2: Pseudocode for updating the repeat alignment and commitment vector.
Open Source Code No The paper states, 'Our implementation is based on the code available at https://github.com/nyu-dl/dl4mt-cdec', which refers to the baseline's code. There is no explicit statement or link indicating that the source code for the proposed PAG/rPAG methodology is publicly available.
Open Datasets Yes We evaluate our model and report results on character-level translation tasks from WMT 15 for English to German, English to Finnish, and English to Czech language pairs. ... SQUAD (Rajpurkar et al., 2016) is a question answering (QA) corpus...
Dataset Splits Yes We used 2000 examples from SQUAD’s training set for validation and used the official development set as a test set to evaluate our models. ... We use newstest2013 as our development set, newstest2014 as our 'Test 2014' and newstest2015 as our 'Test 2015' set.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It discusses model parameters and training but not the underlying hardware.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific optimizer library versions), that would be needed to replicate the experiment environment precisely.
Experiment Setup Yes For training all models we use the Adam optimizer with initial learning rate set to 0.0002. We clip gradients with a threshold of 5 (Pascanu et al., 2013b) and set the number of planning steps (k) to 10 throughout. ... We used 600 units in (r)PAG’s encoder and decoder, while the baseline used 512 in the encoder and 1024 units in the decoder. ... We tested all models with a beam size of 15.