Trellis Networks for Sequence Modeling
Authors: Shaojie Bai, J. Zico Kolter, Vladlen Koltun
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that trellis networks outperform the current state of the art methods on a variety of challenging benchmarks, including word-level language modeling and character-level language modeling tasks, and stress tests designed to evaluate long-term memory retention. |
| Researcher Affiliation | Collaboration | Shaojie Bai Carnegie Mellon University J. Zico Kolter Carnegie Mellon University and Bosch Center for AI Vladlen Koltun Intel Labs |
| Pseudocode | No | The paper describes the Trellis Network architecture and its computations using mathematical equations and descriptive text, but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available here1. 1https://github.com/locuslab/trellisnet |
| Open Datasets | Yes | We evaluate trellis networks on challenging benchmarks, including word-level language modeling on the standard Penn Treebank (PTB) and the much larger Wiki Text-103 (WT103) datasets; character-level language modeling on Penn Treebank; and standard stress tests (e.g. sequential MNIST, permuted MNIST, etc.)... The original Penn Treebank (PTB) dataset... (Marcus et al., 1993)... Wiki Text-103 (WT103)... (Merity et al., 2017)... The MNIST handwritten digits dataset (Le Cun et al., 1989)... The CIFAR-10 dataset (Krizhevsky & Hinton, 2009)... |
| Dataset Splits | Yes | the PTB dataset contains 888K words for training, 70K for validation and 79K for testing... WT103... with 103M words for training, 218K words for validation, and 246K words for testing/evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Table 5 specifies the trellis networks used for the various tasks. There are a few things to note while reading the table. First, in training, we decay the learning rate once the validation error plateaus for a while (or according to some fixed schedule, such as after 100 epochs). Second, for auxiliary loss (see Appendix B for more details), we insert the loss function after every fixed number of layers in the network. This frequency is included below under the Auxiliary Frequency entry. Finally, the hidden dropout in the Table refers to the variational dropout we translated from RNNs (see Appendix B), which is applied at all hidden layers of the Trellis Net. |