reproducibilityindex.ai

Deep Residual Output Layers for Neural Language Generation

Authors: Nikolaos Pappas, James Henderson

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on three language generation tasks show that our output label mapping can match or improve state-of-the-art recurrent and self-attention architectures, and suggest that the classiﬁer does not necessarily need to be highrank to better model natural language if it is better at capturing the structure of the output space.
Researcher Affiliation	Academia	1Idiap Research Institute, Martigny, Switzerland. Correspondence to: Nikolaos Pappas <nikolaos.pappas@idiap.ch>.
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	Yes	Our code and settings are available at http://github.com/idiap/drill.
Open Datasets	Yes	Following previous work in language modeling (Yang et al., 2018; Krause et al., 2018; Merity et al., 2017; Melis et al., 2017), we evaluate the proposed model in terms of perplexity on two widely used language modeling datasets, namely Penn Treebank (Mikolov et al., 2010) and Wiki Text-2 (Merity et al., 2017) which have vocabularies of 10,000 and 33,278 words, respectively.
Dataset Splits	Yes	Our hyper-parameters were optimized based on validation perplexity, as follows: 4-layer label encoder depth, 400-dimensional label embeddings, 0.6 dropout rate, residual connection to E, uniform weight initialization in the interval [ 0.1, 0.1], for both datasets, and, furthermore, sigmoid activation and variational dropout for Penn Treebank, as well as relu activation and standard dropout for Wikitext-2. (Also for NMT): using the Newstest2013 set for validation and the Newstest2014 set for testing.
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	For the implementation of the AWD-LSTM we used the language modeling toolkit in Pytorch provided by Merity et al. (2017),3 and for the dynamic evaluation the code in Pytorch provided by Krause et al. (2018).4 (The paper mentions software like Pytorch and Open NMT, but without specific version numbers.)
Experiment Setup	Yes	Our hyper-parameters were optimized based on validation perplexity, as follows: 4-layer label encoder depth, 400-dimensional label embeddings, 0.6 dropout rate, residual connection to E, uniform weight initialization in the interval [ 0.1, 0.1], for both datasets, and, furthermore, sigmoid activation and variational dropout for Penn Treebank, as well as relu activation and standard dropout for Wikitext-2. (Also for NMT): 2-layer label encoder depth, 512-dimensional label embeddings, 0.0 dropout rate, sigmoid activation function, residual connection to E, and uniform weight initialization in [ 0.1, 0.1].