reproducibilityindex.ai

Language Modeling with Gated Convolutional Networks

Authors: Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report results on two public large-scale language modeling datasets. First, the Google Billion Word dataset (Chelba et al., 2013)... Second, Wiki Text-103... (Merity et al., 2016). We compare the different gating schemes experimentally in Section 5.2
Researcher Affiliation	Industry	1Facebook AI Research. Correspondence to: Yann N. Dauphin <ynd@fb.com>.
Pseudocode	No	The paper presents mathematical equations and describes the architecture verbally but does not include any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We report results on two public large-scale language modeling datasets. First, the Google Billion Word dataset (Chelba et al., 2013)... Second, Wiki Text-103... (Merity et al., 2016).
Dataset Splits	No	The paper states 'We found good hyper-parameter conﬁgurations by crossvalidating with random search on a validation set.' and mentions evaluating 'on the standard held out test portion of each dataset'. However, it does not provide specific percentages or counts for the validation set, nor detailed methodology for its split.
Hardware Specification	Yes	We implement our models in Torch (Collobert et al., 2011) and train on Tesla M40 GPUs. The majority of our models are trained on single GPU... We trained larger models with an 8-GPU setup...
Software Dependencies	No	The paper states 'We implement our models in Torch (Collobert et al., 2011)'. While the citation points to 'Torch7', the text itself only mentions 'Torch' without an explicit version number for the software used in their implementation.
Experiment Setup	Yes	In terms of optimization, we initialize the layers of the model with the Kaiming initialization (He et al., 2015b), with the learning rate sampled uniformly in the interval [1., 2.], the momentum set to 0.99, and clipping set to 0.1.