Sequence Modeling with Unconstrained Generation Order
Authors: Dmitrii Emelianenko, Elena Voita, Pavel Serdyukov
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that this model is superior to fixed order models on a number of sequence generation tasks, such as Machine Translation, Image-to-La Te X and Image Captioning. |
| Researcher Affiliation | Collaboration | Dmitrii Emelianenko1,2 Elena Voita1,3 Pavel Serdyukov1 1Yandex, Russia 2National Research University Higher School of Economics, Russia 3University of Amsterdam, Netherlands |
| Pseudocode | Yes | Algorithm 1: Training procedure (simplified) |
| Open Source Code | Yes | The source code is available at https://github.com/TIXFeniks/neurips2019_intrus. |
| Open Datasets | Yes | We consider three sequence generation tasks: Machine Translation, Image-To-Latex and Image Captioning. For each, we now define input X and output Y , the datasets and the task-specific encoder we use. ...Our experiments include: En-Ru and Ru-En WMT14; En-Ja ASPEC [23]; En-Ar, En-De and De-En IWSLT14 Machine Translation data sets. ...We use the Image To Latex-140K [18, 26] data set. ...We use MSCOCO [27], the standard Image Captioning dataset. |
| Dataset Splits | No | The paper mentions using validation data for beam search selection ('selected using the validation data for both baseline and INTRUS'), but it does not provide specific split percentages or sample counts for training/validation/test sets for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like 'Moses tokenizer', 'BPE', and 'keras applications' but does not specify their version numbers or other key software dependencies with specific versions for reproducibility. |
| Experiment Setup | Yes | The models are trained until convergence with base learning rate 1.4e-3, 16,000 warm-up steps and batch size of 4,000 tokens. We vary the learning rate over the course of training according to [10] and follow their optimization technique. We use beam search with the beam between 4 and 64 selected using the validation data for both baseline and INTRUS, although our model benefits more when using even bigger beam sizes. The pretraining phase of INTRUS is 105 batches. |