reproducibilityindex.ai

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

Authors: Trieu Trinh, Andrew Dai, Thang Luong, Quoc Le

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a variety of settings, including pixel-by-pixel image classiﬁcation with sequence lengths up to 16 000, and a real document classiﬁcation benchmark. Our results highlight good performance and resource efﬁciency of this approach over competitive baselines, including other recurrent models and a comparable sized Transformer.
Researcher Affiliation	Industry	Trieu H. Trinh1 Andrew M. Dai Minh-Thang Luong Quoc V. Le {thtrieu, adai, thangluong, qvl}@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions that models are implemented in TensorFlow and uses Tensor2Tensor (a third-party library), but does not provide concrete access to the source code for the specific methodology described in the paper. It only links to the general Tensor2Tensor repository.
Open Datasets	Yes	We evaluate our method on a variety of settings, including pixel-by-pixel image classiﬁcation with sequence lengths up to 16 000, and a real document classiﬁcation benchmark. Our results highlight good performance... Datasets include MNIST, p MNIST, CIFAR10, Stanford Dogs (Khosla et al., 2011), and DBpedia (Zhang et al., 2015).
Dataset Splits	No	The paper lists training set sizes for various datasets but does not explicitly provide validation dataset split information (percentages, sample counts, or specific methodology for splits) needed to reproduce data partitioning for validation.
Hardware Specification	Yes	We therefore restrict each training session to the same amount of resource (a single Tesla P100 GPU) and report infeasible whenever a mini-batch of one training example can no longer ﬁt into memory.
Software Dependencies	No	The paper mentions 'TensorFlow' and 'Tensor2Tensor' but does not provide specific version numbers for these software dependencies, which are necessary for reproducibility.
Experiment Setup	Yes	We use a single-layer LSTM with 128 cells and an embedding size of 128 to read the input sequence. Our RNNs are trained using the RMSProp optimizer (Tieleman & Hinton, 2012) with batch size of 128. Unsupervised pretraining is done in 100 epochs with initial learning rate of 0.001...gradients are truncated to 300 time steps...sampling n=1 segment of length l=600 per training example.