reproducibilityindex.ai

Multiplicative LSTM for sequence modelling

Authors: Ben Krause, Iain Murray, Steve Renals, Liang Lu

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically that m LSTM outperforms standard LSTM and its deep variants for a range of character level modelling tasks, and that this improvement increases with the complexity of the task. This model achieves a test error of 1.19 bits/character on the last 4 million characters of the Hutter prize dataset when combined with dynamic evaluation.
Researcher Affiliation	Academia	Ben Krause, Iain Murray & Steve Renals School of Informatics, University of Edinburgh Edinburgh, Scotland, UK {ben.krause,i.murray,s.renals}@ed.ac.uk Liang Lu Toyota Technological Institute at Chicago Chicago, Illinois, USA {llu}@ttic.edu
Pseudocode	No	The paper contains mathematical equations describing the model, but no structured pseudocode or algorithm blocks are present.
Open Source Code	Yes	Code to replicative our large scale experiments on the Hutter prize dataset is available at https://github.com/benkrause/m LSTM.
Open Datasets	Yes	We used the Penn Treebank dataset (Marcus et al., 1993) to test small scale language modelling, the processed and raw versions of the Wikipedia text8 dataset (Hutter, 2012) to test large scale language modelling and byte level language modelling respectively, and the European parliament dataset (Koehn, 2005) to investigate multilingual ﬁtting.
Dataset Splits	Yes	The ﬁrst 90 million characters used for training, the next 5 million used for validation, and the ﬁnal 5 million used for testing. Each dataset was split 90-5-5 for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models, memory, or specific computing environments.
Software Dependencies	No	The paper mentions using 'a variant of RMSprop' but does not specify any software libraries, frameworks, or programming languages with their respective version numbers.
Experiment Setup	Yes	Gradient computation in these experiments used truncated backpropagation through time on sequences of length 100, only resetting the hidden state every 10 000 timesteps to allow networks access to information far in the past. All experiments used a variant of RMSprop, (Tieleman & Hinton, 2012), with normalized updates in place of a learning rate. We ﬁtted an m LSTM with 700 hidden units to the Penn Treebank dataset, with no regularization other than early stopping. We trained an m LSTM with hidden dimensionality of 1900 on the text8 dataset. All experiments were run for 4 epochs.