Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
Authors: Brian DuSell, David Chiang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show validation set performance as a function of training time in Figure 1, and test performance binned by string length in Figure 2 (see also Appendix C for wall-clock training times). |
| Researcher Affiliation | Academia | Brian Du Sell and David Chiang Department of Computer Science and Engineering, University of Notre Dame {bdusell1,dchiang}@nd.edu |
| Pseudocode | No | The paper presents equations describing the model's operations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/bdusell/nondeterministic-stack-rnn. |
| Open Datasets | Yes | We ran exploratory experiments with the NS-RNN, RNS-RNN, and other language models on the Penn Treebank (PTB) as preprocessed by Mikolov et al. (2011). We randomly sample from these languages to create training, validation, and test sets. |
| Dataset Splits | Yes | For every training run, we sample a training set of 10,000 strings from the PCFG, with lengths drawn uniformly from [40, 80]. Similarly, we sample a validation set of 1,000 strings with lengths drawn uniformly from [40, 80]. |
| Hardware Specification | Yes | We ran experiments for the NS models in GPU mode on a pool of the following NVIDIA GPU models, automatically selected based on availability: Ge Force GTX TITAN X, TITAN X (Pascal), and Ge Force GTX 1080 Ti. |
| Software Dependencies | No | The paper mentions developing code in a 'Docker container' and provides a GitHub repository, but it does not specify versions for key software dependencies or libraries used for the experiments (e.g., Python version, deep learning framework version like PyTorch or TensorFlow). |
| Experiment Setup | Yes | The hyperparameters for our baseline LSTM, initialization, and optimization scheme are based on the unregularized LSTM experiments of Semeniuta et al. (2016). We train all models using simple stochastic gradient descent (SGD)... and truncated BPTT with a sequence length of 35. For all models, we use a minibatch size of 32. We randomly initialize all parameters uniformly from the interval [ 0.05, 0.05]. We divide the learning rate by 1.5 whenever the validation perplexity does not improve... |