reproducibilityindex.ai

Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models

Authors: Eldan Cohen, Christopher Beck

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an empirical study of the behavior of beam search across three sequence synthesis tasks. We perform an extensive empirical evaluation over multiple tasks, models, datasets, and evaluation metrics.
Researcher Affiliation	Academia	Eldan Cohen 1 J. Christopher Beck 1 1Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada. Correspondence to: Eldan Cohen <ecohen@mie.utoronto.ca>.
Pseudocode	No	The paper describes the beam search algorithm mathematically in Section 2.1, but it does not include a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper refers to using existing toolkits like 'fairseq-py toolkit' and 'Open NMT toolkit', but it does not state that the authors' own implementation code for the described methodology is open-source or provide a link to it.
Open Datasets	Yes	Machine Translation. We use the convolutional model by Gehring et al. (2017) implemented in the fairseq-py toolkit. We present results for two models, trained on WMT 14 En Fr and En-De datasets and evaluated on newstest2014 En-Fr and En-De, respectively, with a vocabulary based on byte pair encoding (BPE; Sennrich et al., 2016). Summarization. We use the abstractive summarization model by Chopra et al. (2016) implemented in the Open NMT toolkit (Klein et al., 2017). The model is trained and evaluated using Rush et al. s (2015) test split of the Gigaword corpus (Graff et al., 2003). Image Captioning. We use the model by Vinyals et al. (2017), trained on the MSCOCO dataset (Lin et al., 2014).
Dataset Splits	No	The paper mentions tuning M and N 'on a held-out validation set' but does not provide specific details on the dataset split (e.g., percentages, sample counts, or methodology for creating the split).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies	No	The paper mentions using 'fairseq-py toolkit' and 'Open NMT toolkit' but does not specify their version numbers or any other software dependencies with their respective versions.
Experiment Setup	No	The paper details the beam widths used for analysis and mentions tuning discrepancy thresholds, but it does not provide specific model training hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for the models used in the experiments.