Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Authors: Trieu Trinh, Andrew Dai, Thang Luong, Quoc Le
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a variety of settings, including pixel-by-pixel image classification with sequence lengths up to 16 000, and a real document classification benchmark. Our results highlight good performance and resource efficiency of this approach over competitive baselines, including other recurrent models and a comparable sized Transformer. |
| Researcher Affiliation | Industry | Trieu H. Trinh1 Andrew M. Dai Minh-Thang Luong Quoc V. Le {thtrieu, adai, thangluong, qvl}@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that models are implemented in TensorFlow and uses Tensor2Tensor (a third-party library), but does not provide concrete access to the source code for the specific methodology described in the paper. It only links to the general Tensor2Tensor repository. |
| Open Datasets | Yes | We evaluate our method on a variety of settings, including pixel-by-pixel image classification with sequence lengths up to 16 000, and a real document classification benchmark. Our results highlight good performance... Datasets include MNIST, p MNIST, CIFAR10, Stanford Dogs (Khosla et al., 2011), and DBpedia (Zhang et al., 2015). |
| Dataset Splits | No | The paper lists training set sizes for various datasets but does not explicitly provide validation dataset split information (percentages, sample counts, or specific methodology for splits) needed to reproduce data partitioning for validation. |
| Hardware Specification | Yes | We therefore restrict each training session to the same amount of resource (a single Tesla P100 GPU) and report infeasible whenever a mini-batch of one training example can no longer fit into memory. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'Tensor2Tensor' but does not provide specific version numbers for these software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | We use a single-layer LSTM with 128 cells and an embedding size of 128 to read the input sequence. Our RNNs are trained using the RMSProp optimizer (Tieleman & Hinton, 2012) with batch size of 128. Unsupervised pretraining is done in 100 epochs with initial learning rate of 0.001...gradients are truncated to 300 time steps...sampling n=1 segment of length l=600 per training example. |