reproducibilityindex.ai

Variational Smoothing in Recurrent Neural Network Language Models

Authors: Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify our analysis on two benchmark language modeling datasets and demonstrate performance improvements over existing data noising methods.
Researcher Affiliation	Industry	Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama Deep Mind {lingpenk, melisgl, lingwang, leiyu, dyogatama}@google.com
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any link or statement about making its source code publicly available.
Open Datasets	Yes	We evaluate our approaches on two standard language modeling datasets: Penn Treebank (Marcus et al., 1994) and Wikitext-2 (Merity et al., 2017).
Dataset Splits	No	The paper mentions using a 'development set' and 'test set' but does not specify the exact percentages or sample counts for training, validation, and test splits, nor does it cite predefined splits with specific details.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU/CPU models, memory, or specific computing infrastructure) used to run the experiments.
Software Dependencies	No	The paper mentions software components like LSTM and RMSprop but does not provide specific version numbers for these or other dependencies required for reproducibility.
Experiment Setup	Yes	We tune the RMSprop learning rate and ℓ2 regularization hyperparameter λ for all models on a development set by a grid search on {0.002, 0.003, 0.004} and {10 4, 10 3} respectively, and use perplexity on the development set to choose the best model. We also tune γ from {0.1, 0.2, 0.3, 0.4}. We use recurrent dropout (Semeniuta et al., 2016) for R and set it to 0.2, and apply (element-wise) input and output embedding dropouts for E and O and set it to 0.5 when E, O RV 512 and 0.7 when E, O RV 1024 based on preliminary experiments. We tie the input and output embedding matrices in all our experiments (i.e., E = O), except for the vanilla LSTM model, where we report results for both tied and untied.