reproducibilityindex.ai

Recurrently Controlled Recurrent Networks

Authors: Yi Tay, Anh Tuan Luu, Siu Cheung Hui

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on a myriad of tasks in the NLP domain such as sentiment analysis (SST, IMDb, Amazon reviews, etc.), question classiﬁcation (TREC), entailment classiﬁcation (SNLI, Sci Tail), answer selection (Wiki QA, Trec QA) and reading comprehension (Narrative QA). Across all 26 datasets, our results demonstrate that RCRN not only consistently outperforms Bi LSTMs but also stacked Bi LSTMs, suggesting that our controller architecture might be a suitable replacement for the widely adopted stacked architecture.
Researcher Affiliation	Academia	Yi Tay1, Luu Anh Tuan2, and Siu Cheung Hui3 1,3Nanyang Technological University 2Institute for Infocomm Research ytay017@ntu.edu.sg1 at.luu@i2r.a-star.edu.sg2 asschui@ntu.edu.sg3
Pseudocode	No	The paper provides mathematical equations (1-16) to describe the model architecture but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	The source code of our model can be found at https://github.com/ vanzytay/NIPS2018_RCRN.
Open Datasets	Yes	We conduct extensive experiments on a myriad of tasks in the NLP domain such as sentiment analysis (SST, IMDb, Amazon reviews, etc.), question classiﬁcation (TREC), entailment classiﬁcation (SNLI, Sci Tail), answer selection (Wiki QA, Trec QA) and reading comprehension (Narrative QA). More concretely, we use 16 Amazon review datasets from [Liu et al., 2017], the well-established Stanford Sentiment Tree Bank (SST-5/SST-2) [Socher et al., 2013] and the IMDb Sentiment dataset [Maas et al., 2011]. ... We use the TREC question classiﬁcation dataset [Voorhees et al., 1999]. ... We use two popular benchmark datasets, i.e., the Stanford Natural Language Inference (SNLI) corpus [Bowman et al., 2015], and Sci Tail (Science Entailment) [Khot et al., 2018] datasets. ... We use the popular Wiki QA [Yang et al., 2015] and Trec QA [Wang et al., 2007] datasets. ... We use the recent Narrative QA [Koˇcisk y et al., 2017] dataset...
Dataset Splits	No	The paper mentions using standard benchmark datasets and tuning hyperparameters, which implies the use of validation sets (e.g., 'learning rate is tuned amongst {0.001, 0.0003, 0.0004}'), but it does not explicitly provide the specific percentages or sample counts for training, validation, and test splits for any of the datasets used.
Hardware Specification	Yes	We use the same standard hardware (a single Nvidia GTX1070 card) and an identical overarching model architecture.
Software Dependencies	No	The paper mentions using 'CUDNN optimized version' and states 'We adapt the CUDA kernel as a custom Tensorﬂow op in our experiments', but it does not provide specific version numbers for TensorFlow, CUDNN, or CUDA.
Experiment Setup	Yes	In this section, we describe the task-speciﬁc model architectures for each task. Classiﬁcation Model ... We use 300D Glo Ve [Pennington et al., 2014] vectors with 600D Co Ve [Mc Cann et al., 2017] vectors as pretrained embedding vectors. ... The output of the embedding layer is passed into the RCRN model directly ... Word embeddings are not updated during training. Given the hidden output states of the 200d dimensional RCRN cell, we take the concatenation of the max, mean and min pooling of all hidden states to form the ﬁnal feature vector. This feature vector is passed into a single dense layer with Re LU activations of 200d dimensions. The output of this layer is then passed into a softmax layer for classiﬁcation. This model optimizes the cross entropy loss. We train this model using Adam [Kingma and Ba, 2014] and learning rate is tuned amongst {0.001, 0.0003, 0.0004}. Entailment Model ... two layer highway network [Srivastava et al., 2015] of 300 hidden dimensions ... We train this model using Adam and learning rate is tuned amongst {0.001, 0.0003, 0.0004}. Ranking Model ... The dimensionality is set to 200. The similarity scoring function is the cosine similarity and the objective function is the pairwise hinge loss with a margin of 0.1. We use negative sampling of n = 6 to train our model. We train our model using Adadelta [Zeiler, 2012] with a learning rate of 0.2. Reading Comprehension Model ... The dimensionality of the encoder is set to 75. We train both models using Adam with a learning rate of 0.001.