Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Authors: Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that it outperforms a strong baseline on character-level translation tasks from WMT 15, the algorithmic task of finding Eulerian circuits of graphs, and question generation from the text. Our analysis demonstrates that the model computes qualitatively intuitive alignments, converges faster than the baselines, and achieves superior performance with fewer parameters.
Researcher Affiliation Collaboration Francis Dutil University of Montreal (MILA) EMAIL Caglar Gulcehre University of Montreal (MILA) EMAIL Adam Trischler Microsoft Research Maluuba EMAIL Yoshua Bengio University of Montreal (MILA) EMAIL
Pseudocode Yes Algorithm 1: Pseudocode for updating the alignment plan and commitment vector. Algorithm 2: Pseudocode for updating the repeat alignment and commitment vector.
Open Source Code No The paper states, 'Our implementation is based on the code available at https://github.com/nyu-dl/dl4mt-cdec', which refers to the baseline's code. There is no explicit statement or link indicating that the source code for the proposed PAG/rPAG methodology is publicly available.
Open Datasets Yes We evaluate our model and report results on character-level translation tasks from WMT 15 for English to German, English to Finnish, and English to Czech language pairs. ... SQUAD (Rajpurkar et al., 2016) is a question answering (QA) corpus...
Dataset Splits Yes We used 2000 examples from SQUAD’s training set for validation and used the official development set as a test set to evaluate our models. ... We use newstest2013 as our development set, newstest2014 as our 'Test 2014' and newstest2015 as our 'Test 2015' set.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It discusses model parameters and training but not the underlying hardware.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific optimizer library versions), that would be needed to replicate the experiment environment precisely.
Experiment Setup Yes For training all models we use the Adam optimizer with initial learning rate set to 0.0002. We clip gradients with a threshold of 5 (Pascanu et al., 2013b) and set the number of planning steps (k) to 10 throughout. ... We used 600 units in (r)PAG’s encoder and decoder, while the baseline used 512 in the encoder and 1024 units in the decoder. ... We tested all models with a beam size of 15.