Multiway Attention Networks for Modeling Sentence Pairs
Authors: Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed multiway attention networks improve the result on the Quora Question Pairs, SNLI, Multi NLI, and answer sentence selection task on the SQu AD dataset. |
| Researcher Affiliation | Collaboration | State Key Laboratory of Software Development Environment, Beihang University, China Microsoft Research, Beijing, China +Peking University, Beijing, China |
| Pseudocode | No | The paper includes diagrams and descriptions of the model architecture and processes, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of its source code, nor does it include a link to a code repository for the methodology described. |
| Open Datasets | Yes | Quora Question Pairs This dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other. SNLI It is a natural language inference dataset [Bowman et al., 2015]. Multi NLI It is a natural language inference dataset [Williams et al., 2017]. SQu AD It is a reading comprehension dataset, where the answer to each question is a span of text from the corresponding passage [Rajpurkar et al., 2016]. |
| Dataset Splits | Yes | Quora Question Pairs... We select 5,000 paraphrases and 5,000 non-paraphrases as the development set, and use another 5,000 paraphrases and 5,000 non-paraphrases as the test set. We keep the remaining instances as the training set. SNLI... we have 549,367 pairs for training, 9,842 pairs for development and 9,824 pairs for test. Multi NLI... This dataset contains 392,702 pairs for training, 9,815 matched pairs and 9,832 mismatched pairs for development, 9,796 matched pairs and 9,847 mismatched pairs for test. SQu AD... we split the 10,570 instances in the development set to 5,000 for development and 5,570 for test. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments, such as specific GPU or CPU models, memory, or other detailed computing specifications. |
| Software Dependencies | No | The paper mentions using GloVe embeddings, a pre-trained language model (ELMo), GRU, Ada Delta, dropout, and the Stanford Core NLP Toolkit, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We use 300-dimensional uncased pre-trained Glo Ve embeddings without update during training. Hidden vector length is set to 150 for all layers. We apply dropout between layers, with dropout rate 0.2. The model is optimized using Ada Delta with initial learning rate of 1.0. |