reproducibilityindex.ai

Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

Authors: Brian DuSell, David Chiang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show validation set performance as a function of training time in Figure 1, and test performance binned by string length in Figure 2 (see also Appendix C for wall-clock training times).
Researcher Affiliation	Academia	Brian Du Sell and David Chiang Department of Computer Science and Engineering, University of Notre Dame {bdusell1,dchiang}@nd.edu
Pseudocode	No	The paper presents equations describing the model's operations but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/bdusell/nondeterministic-stack-rnn.
Open Datasets	Yes	We ran exploratory experiments with the NS-RNN, RNS-RNN, and other language models on the Penn Treebank (PTB) as preprocessed by Mikolov et al. (2011). We randomly sample from these languages to create training, validation, and test sets.
Dataset Splits	Yes	For every training run, we sample a training set of 10,000 strings from the PCFG, with lengths drawn uniformly from [40, 80]. Similarly, we sample a validation set of 1,000 strings with lengths drawn uniformly from [40, 80].
Hardware Specification	Yes	We ran experiments for the NS models in GPU mode on a pool of the following NVIDIA GPU models, automatically selected based on availability: Ge Force GTX TITAN X, TITAN X (Pascal), and Ge Force GTX 1080 Ti.
Software Dependencies	No	The paper mentions developing code in a 'Docker container' and provides a GitHub repository, but it does not specify versions for key software dependencies or libraries used for the experiments (e.g., Python version, deep learning framework version like PyTorch or TensorFlow).
Experiment Setup	Yes	The hyperparameters for our baseline LSTM, initialization, and optimization scheme are based on the unregularized LSTM experiments of Semeniuta et al. (2016). We train all models using simple stochastic gradient descent (SGD)... and truncated BPTT with a sequence length of 35. For all models, we use a minibatch size of 32. We randomly initialize all parameters uniformly from the interval [ 0.05, 0.05]. We divide the learning rate by 1.5 whenever the validation perplexity does not improve...