reproducibilityindex.ai

MLE-Guided Parameter Search for Task Loss Minimization in Neural Sequence Modeling

Authors: Sean Welleck, Kyunghyun Cho14032-14040

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation. 5 Experiments 5.1 Text Completion with GPT-2 Table 1: Text completion results (GPT-2, Wikitext-103 test set)
Researcher Affiliation	Academia	Sean Welleck, Kyunghyun Cho New York University Correspondence to: wellecks@nyu.edu.
Pseudocode	Yes	Algorithm 1: MLE-guided parameter search (MGS).
Open Source Code	Yes	Code available at https://github.com/wellecks/mgs.
Open Datasets	Yes	We use the Wikitext-103 dataset (Merity et al. 2016) We experiment on the IWSLT 14 German to English task (Cettolo et al. 2014)
Dataset Splits	Yes	The resulting dataset consists of 874,556 training, 1,896 validation, and 2,162 test pairs.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions 'fairseq' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We use 4 candidates, and compute training task loss with a max decoding length of 1.3 times the ground-truth length. Models are evaluated with a max decoding length of 500 tokens. We performed a grid search using α {0.1, 0.3, 0.5}, selecting α based on the validation task loss that the model is optimizing. We use 4 candidates and a grid search over noise ({0.01, 0.1, 1.0}) and α ({1.0, 10.0, 100.0}). For fine-tuning, we use a batch size of 16k tokens, and accumulate gradients for 4 iterations.