Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models
Authors: Eldan Cohen, Christopher Beck
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an empirical study of the behavior of beam search across three sequence synthesis tasks. We perform an extensive empirical evaluation over multiple tasks, models, datasets, and evaluation metrics. |
| Researcher Affiliation | Academia | Eldan Cohen 1 J. Christopher Beck 1 1Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada. Correspondence to: Eldan Cohen <ecohen@mie.utoronto.ca>. |
| Pseudocode | No | The paper describes the beam search algorithm mathematically in Section 2.1, but it does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper refers to using existing toolkits like 'fairseq-py toolkit' and 'Open NMT toolkit', but it does not state that the authors' own implementation code for the described methodology is open-source or provide a link to it. |
| Open Datasets | Yes | Machine Translation. We use the convolutional model by Gehring et al. (2017) implemented in the fairseq-py toolkit. We present results for two models, trained on WMT 14 En Fr and En-De datasets and evaluated on newstest2014 En-Fr and En-De, respectively, with a vocabulary based on byte pair encoding (BPE; Sennrich et al., 2016). Summarization. We use the abstractive summarization model by Chopra et al. (2016) implemented in the Open NMT toolkit (Klein et al., 2017). The model is trained and evaluated using Rush et al. s (2015) test split of the Gigaword corpus (Graff et al., 2003). Image Captioning. We use the model by Vinyals et al. (2017), trained on the MSCOCO dataset (Lin et al., 2014). |
| Dataset Splits | No | The paper mentions tuning M and N 'on a held-out validation set' but does not provide specific details on the dataset split (e.g., percentages, sample counts, or methodology for creating the split). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'fairseq-py toolkit' and 'Open NMT toolkit' but does not specify their version numbers or any other software dependencies with their respective versions. |
| Experiment Setup | No | The paper details the beam widths used for analysis and mentions tuning discrepancy thresholds, but it does not provide specific model training hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for the models used in the experiments. |