reproducibilityindex.ai

Neural Text Generation With Unlikelihood Training

Authors: Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We follow a standard language modeling setup from Baevski and Auli (2019) and evaluate our method on the task of sequence completion, detailed below. We show that both token and sequence level unlikelihood training give less repetitive, less dull text while maintaining perplexity, giving superior generations using standard greedy or beam search. According to human evaluations, our approach with standard beam search also outperforms the currently popular decoding methods of nucleus sampling or beam blocking, thus providing a strong alternative to existing techniques.
Researcher Affiliation	Collaboration	1New York University, 2Facebook AI Research, 3CIFAR Azrieli Global Scholar
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and trained models are available at https://github.com/facebookresearch/unlikelihood_training; implemented with Fairseq (Ott et al., 2019).
Open Datasets	Yes	We use the Wikitext-103 dataset (Merity et al., 2016), a large-scale collection of Wikipedia articles containing over 100 million words and 260 thousand unique tokens.
Dataset Splits	No	The paper states it uses the Wikitext-103 dataset and evaluates on its validation set, but it does not provide specific percentages or sample counts for the training, validation, and test splits, nor does it cite a source that defines these specific splits for reproducibility.
Hardware Specification	No	The paper mentions training on "8 GPUs" and later "a single GPU" due to "GPU memory constraints" but does not specify the model or type of GPUs used (e.g., NVIDIA A100, Tesla V100), or any other specific hardware details like CPU or memory.
Software Dependencies	No	The paper states "implemented with Fairseq (Ott et al., 2019)" but does not specify a version number for Fairseq or any other software dependency.
Experiment Setup	Yes	We train on ﬁxed-length contiguous sequences, in our case of length 1,536... For the token-level losses (LMLE, LUL-token), we train each model on 8 GPUs for a maximum of 150k updates, evaluating on the validation set and saving the model state every 10k updates. Models are ﬁne-tuned for 1,500 total updates. With probability 0.5 an update uses LULS... The experiments use a preﬁx length k = 50 and continuation length N = 100 for ﬁne-tuning. For deterministic decoding we use greedy search and beam search with beam size 10, and for stochastic decoding we use top-k sampling with k {3, 50} and nucleus sampling with p {0.3, 0.9}.