Semantic Sentence Matching with Densely-Connected Recurrent and Co-Attentive Information
Authors: Seonhoon Kim, Inho Kang, Nojun Kwak6586-6593
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks. |
| Researcher Affiliation | Collaboration | Seonhoon Kim,1,2 Inho Kang,1 Nojun Kwak2 1Search&Clova, Naver Corp. 2Seoul National University {seonhoon.kim, once.ihkang}@navercorp.com, nojunk@snu.ac.kr |
| Pseudocode | No | The paper describes its architecture and equations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete links or statements about the availability of its source code. |
| Open Datasets | Yes | We evaluate our matching model on five popular and well-studied benchmark datasets for three challenging sentence matching tasks: (i) SNLI and Multi NLI for natural language inference; (ii) Quora Question Pair for paraphrase identification; and (iii) Trec QA and Sel QA for answer sentence selection in question answering. Additional details about the above datasets can be found in the supplementary materials. |
| Dataset Splits | Yes | The learning parameters were selected based on the best performance on the dev set. |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper mentions using Glo Ve for word embedding and RMSProp optimizer but does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | We initialized word embedding with 300d Glo Ve vectors pre-trained from the 840B Common Crawl corpus (Pennington, Socher, and Manning 2014), while the word embeddings for the out-of-vocabulary words were initialized randomly. We also randomly initialized character embedding with a 16d vector and extracted 32d character representation with a convolutional network. For the densely-connected recurrent layers, we stacked 5 layers each of which have 100 hidden units. We set 1000 hidden units with respect to the fully-connected layers. The dropout was applied after the word and character embedding layers with a keep rate of 0.5. It was also applied before the fully-connected layers with a keep rate of 0.8. For the bottleneck component, we set 200 hidden units as encoded features of the autoencoder with a dropout rate of 0.2. The batch normalization was applied on the fully-connected layers, only for the one-way type datasets. The RMSProp optimizer with an initial learning rate of 0.001 was applied. The learning rate was decreased by a factor of 0.85 when the dev accuracy does not improve. All weights except embedding matrices are constrained by L2 regularization with a regularization constant λ = 10 6. The sequence lengths of the sentence are all different for each dataset: 35 for SNLI, 55 for Multi NLI, 25 for Quora question pair and 50 for Trec QA. The learning parameters were selected based on the best performance on the dev set. We employed 8 different randomly initialized sets of parameters with the same model for our ensemble approach. |