reproducibilityindex.ai

Evaluating Distributional Distortion in Neural Language Modeling

Authors: Benjamin LeBrun, Alessandro Sordoni, Timothy J. O'Donnell

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments reveal that LSTM and Transformer language models (i) systematically underestimate the probability of sequences drawn from the target language, and (ii) do so more severely for lessprobable sequences.
Researcher Affiliation	Collaboration	Benjamin Le Brun1,2, Alessandro Sordoni3,* & Timothy J. O Donnell1,2,4,* 1Mc Gill University 2Mila Quebec Artiﬁcial Intelligence Institute 3Microsoft Research 4Canada CIFAR AI Chair, Mila
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	All Transformer implementations were obtained from Huggingface, and training was done on two or four RTX-8000 GPUs (depending on model size) with mixed ﬂoating point precision.
Open Datasets	Yes	To deﬁne a generative model L, we train a randomly-initialized GPT2-medium on 1.5M sentences sampled from the Open Web Text corpus (Gokaslan & Cohen, 2019).
Dataset Splits	Yes	Models with the lowest cross-entropy loss on a withheld validation set are used in experiments unless otherwise mentioned. ... We begin by exploring model estimation error on a ﬁxed training set Dtrain of 1M sequences sampled from p L. ... we sample a test set Dtest of 500,000 sequences from p L
Hardware Specification	Yes	All Transformer implementations were obtained from Huggingface, and training was done on two or four RTX-8000 GPUs (depending on model size) with mixed ﬂoating point precision.
Software Dependencies	No	We use the Huggingface (Wolf et al., 2020) implementations of GPT2-small, GPT2-medium and GPT2-large (Radford et al., 2019) as representative Transformer LMs.
Experiment Setup	Yes	For all model sizes, we use a batch size of 128 sequences. ... We use Adam Optimization with ϵ = 1e 8 and learning rates α = 5e 5, α = 4e 5 and α = 3e 5 for GPT2-small, -medium and -large respectively.