reproducibilityindex.ai

Neural Language Modeling by Jointly Learning Syntax and Lexicon

Authors: Yikang Shen, Zhouhan Lin, Chin-wei Huang, Aaron Courville

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on three tasks: word-level language modeling, character-level language modeling, and unsupervised constituency parsing. The proposed model achieves (or is close to) the state-of-the-art on both word-level and character-level language modeling. The model s unsupervised parsing outperforms some strong baseline models, demonstrating that the structure found by our model is similar to the intrinsic structure provided by human experts.
Researcher Affiliation	Academia	Yikang Shen, Zhouhan Lin, Chin-Wei Huang & Aaron Courville Department of Computer Science and Operations Research Universit de Montral Montral, QC H3C3J7, Canada {yi-kang.shen, zhouhan.lin, chin-wei.huang, aaron.courville}@umontreal.ca
Pseudocode	No	The paper describes the model architecture and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any links to open-source code or statements about code availability.
Open Datasets	Yes	We evaluate a character-level variant of our proposed language model over a preprocessed version of the Penn Treebank (PTB) and Text8 datasets. Penn Treebank we process the Penn Treebank dataset (Marcus et al., 1993) by following the procedure introduced in (Mikolov et al., 2012). Text8 dataset contains 17M training tokens and has a vocabulary size of 44k words. The dataset is partitioned into a training set (first 99M characters) and a development set (last 1M characters) that is used to report performance.
Dataset Splits	Yes	The dataset is partitioned into a training set (first 99M characters) and a development set (last 1M characters) that is used to report performance.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments.
Software Dependencies	No	The paper mentions software components like 'Adam' optimizer, 'Layer Normalization', and 'Batch Normalization', but does not provide specific version numbers for these or any other software dependencies such as programming languages or deep learning frameworks.
Experiment Setup	Yes	Optimization is performed with Adam using learning rate lr = 0.003, weight decay wdecay = 10 6, β1 = 0.9, β2 = 0.999 and σ = 10 8. We carry out gradient clipping with maximum norm 1.0. For character-level PTB, Reading Network has two recurrent layers, Predict Network has one residual block. Hidden state size is 1024 units. The input and output embedding size are 128, and not shared. Look-back range L = 10, temperature parameter τ = 10, upper band of memory span Nm = 20. We use a batch size of 64, truncated backpropagation with 100 timesteps. The values used of dropout on input/output embeddings, between recurrent layers, and on recurrent states were (0, 0.25, 0.1) respectively.