reproducibilityindex.ai

Dynamic Evaluation of Neural Sequence Models

Authors: Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply dynamic evaluation to outperform all previous word-level perplexities on the Penn Treebank and Wiki Text-2 datasets (achieving 51.1 and 44.3 respectively) and all previous character-level cross-entropies on the text8 and Hutter Prize datasets (achieving 1.19 bits/char and 1.08 bits/char respectively)." and "7. Experiments We applied dynamic evaluation to wordand character-level language modelling
Researcher Affiliation	Academia	Ben Krause 1 Emmanuel Kahembwe 1 Iain Murray 1 Steve Renals 1 1School of Informatics, University of Edinburgh. Correspondence to: Ben Krause <ben.krause@ed.ac.uk>.
Pseudocode	No	The paper describes the methodology in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1code available at https://github.com/benkrause/ dynamic-evaluation
Open Datasets	Yes	We performed word-level language modelling experiments on the Penn Treebank (PTB, Marcus et al., 1993) and Wiki Text-2 (Merity et al., 2017) datasets." and "The Hutter Prize dataset (Hutter, 2006) is comprised of Wikipedia text, including XML and characters from non Latin languages." and "The text8 dataset is derived from the Hutter Prize dataset
Dataset Splits	Yes	After training the base model, we tune hyper-parameters for dynamic evaluation on the validation set, and evaluate both the static and dynamic versions of the model on the test set." and "We use the same test set as in Mikolov et al. (2014), but also hold out the ﬁnal 100k training tokens as a validation set to allow for fair hyper-parameter tuning" and "We used a 90:5:5 split for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments.
Software Dependencies	No	The paper mentions various software components and models like AWD-LSTM, m LSTM, RMSprop, and ADAM, but does not specify their version numbers or the versions of underlying libraries like Python or PyTorch/TensorFlow.
Experiment Setup	No	The paper describes the methodology and hyper-parameter tuning process, including the use of sequence segments of length 5 for word-level tasks and 20 for character-level tasks. It mentions tuning learning rate and decay parameter but does not provide specific values for these hyperparameters or other training configurations.