reproducibilityindex.ai

On the Softmax Bottleneck of Recurrent Language Models

Authors: Dwarak Govind Parthiban, Yongyi Mao, Diana Inkpen13640-13647

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we show via an extensive empirical study that such a correlation is fairly weak and that the high-rank of the log P matrix is neither necessary nor sufﬁcient for better test perplexity. In our experiments, we reproduced the results of the baseline AWD-LSTM model and its SS, LMS-PLIF, and Mo S counterparts. The models were trained on the Penn Treebank (PTB) dataset.
Researcher Affiliation	Academia	Dwarak Govind Parthiban, Yongyi Mao, Diana Inkpen University of Ottawa yottabytt@gmail.com, ymao@uottawa.ca, diana.inkpen@uottawa.ca
Pseudocode	No	The paper describes functions and mathematical formulations but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The SM and code can be accessed at https://github.com/yottabytt/awd-lstm-lmkit .
Open Datasets	Yes	Following previous works (Yang et al. 2018; Kanai et al. 2018; Ganea et al. 2019), for our language modeling experiments, we use the Penn Treebank (PTB) (Marcus, Santorini, and Marcinkiewicz 1993) and the Wiki Text-2 (WT2) (Merity et al. 2017) datasets.
Dataset Splits	No	The paper reports 'Validation ppl' in Table 2, implying the use of a validation set, but it does not explicitly state the specific train/validation/test split percentages or sample counts used to create these partitions for reproducibility.
Hardware Specification	Yes	GPUs All model training and evaluation were conducted using NVIDIA s V100 GPUs with 32GB of memory. To train a single instance of a model, we use only one GPU and not multiple GPUs.
Software Dependencies	No	The paper states 'Most of our implementation is based on the open source code released by the authors of AWD-LSTM and Mo S,' but does not provide specific version numbers for any software dependencies like Python, PyTorch, or other libraries used.
Experiment Setup	Yes	Hyperparameter Conﬁguration To train an AWD-LSTM based model, there is a hyperparameter called the non-monotone interval n that is used to switch the optimization algorithm from SGD to Averaged SGD. The Mo S model uses bsz = 12, d = 620, and d = 280 whereas the Softmax model uses bsz = 20, d = 400, and d = 400.