Sequence Modeling with Unconstrained Generation Order

Authors: Dmitrii Emelianenko, Elena Voita, Pavel Serdyukov

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that this model is superior to fixed order models on a number of sequence generation tasks, such as Machine Translation, Image-to-La Te X and Image Captioning.
Researcher Affiliation Collaboration Dmitrii Emelianenko1,2 Elena Voita1,3 Pavel Serdyukov1 1Yandex, Russia 2National Research University Higher School of Economics, Russia 3University of Amsterdam, Netherlands
Pseudocode Yes Algorithm 1: Training procedure (simplified)
Open Source Code Yes The source code is available at https://github.com/TIXFeniks/neurips2019_intrus.
Open Datasets Yes We consider three sequence generation tasks: Machine Translation, Image-To-Latex and Image Captioning. For each, we now define input X and output Y , the datasets and the task-specific encoder we use. ...Our experiments include: En-Ru and Ru-En WMT14; En-Ja ASPEC [23]; En-Ar, En-De and De-En IWSLT14 Machine Translation data sets. ...We use the Image To Latex-140K [18, 26] data set. ...We use MSCOCO [27], the standard Image Captioning dataset.
Dataset Splits No The paper mentions using validation data for beam search selection ('selected using the validation data for both baseline and INTRUS'), but it does not provide specific split percentages or sample counts for training/validation/test sets for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software like 'Moses tokenizer', 'BPE', and 'keras applications' but does not specify their version numbers or other key software dependencies with specific versions for reproducibility.
Experiment Setup Yes The models are trained until convergence with base learning rate 1.4e-3, 16,000 warm-up steps and batch size of 4,000 tokens. We vary the learning rate over the course of training according to [10] and follow their optimization technique. We use beam search with the beam between 4 and 64 selected using the validation data for both baseline and INTRUS, although our model benefits more when using even bigger beam sizes. The pretraining phase of INTRUS is 105 batches.