reproducibilityindex.ai

Trellis Networks for Sequence Modeling

Authors: Shaojie Bai, J. Zico Kolter, Vladlen Koltun

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that trellis networks outperform the current state of the art methods on a variety of challenging benchmarks, including word-level language modeling and character-level language modeling tasks, and stress tests designed to evaluate long-term memory retention.
Researcher Affiliation	Collaboration	Shaojie Bai Carnegie Mellon University J. Zico Kolter Carnegie Mellon University and Bosch Center for AI Vladlen Koltun Intel Labs
Pseudocode	No	The paper describes the Trellis Network architecture and its computations using mathematical equations and descriptive text, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available here1. 1https://github.com/locuslab/trellisnet
Open Datasets	Yes	We evaluate trellis networks on challenging benchmarks, including word-level language modeling on the standard Penn Treebank (PTB) and the much larger Wiki Text-103 (WT103) datasets; character-level language modeling on Penn Treebank; and standard stress tests (e.g. sequential MNIST, permuted MNIST, etc.)... The original Penn Treebank (PTB) dataset... (Marcus et al., 1993)... Wiki Text-103 (WT103)... (Merity et al., 2017)... The MNIST handwritten digits dataset (Le Cun et al., 1989)... The CIFAR-10 dataset (Krizhevsky & Hinton, 2009)...
Dataset Splits	Yes	the PTB dataset contains 888K words for training, 70K for validation and 79K for testing... WT103... with 103M words for training, 218K words for validation, and 246K words for testing/evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Table 5 specifies the trellis networks used for the various tasks. There are a few things to note while reading the table. First, in training, we decay the learning rate once the validation error plateaus for a while (or according to some ﬁxed schedule, such as after 100 epochs). Second, for auxiliary loss (see Appendix B for more details), we insert the loss function after every ﬁxed number of layers in the network. This frequency is included below under the Auxiliary Frequency entry. Finally, the hidden dropout in the Table refers to the variational dropout we translated from RNNs (see Appendix B), which is applied at all hidden layers of the Trellis Net.