Bilateral Multi-Perspective Matching for Natural Language Sentences
Authors: Zhiguo Wang, Wael Hamza, Radu Florian
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on three tasks: paraphrase identiļ¬cation, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks. |
| Researcher Affiliation | Industry | Zhiguo Wang, Wael Hamza, Radu Florian IBM T.J. Watson Research Center {zhigwang,whamza,raduf}@us.ibm.com |
| Pseudocode | No | The paper describes the model architecture and mathematical formulas but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | We will release our source code and the dataset partition at https://zhiguowang.github.io/ . |
| Open Datasets | Yes | We choose the paraphrase identiļ¬cation task, and experiment on the Quora Question Pairs dataset 1. This dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other. We randomly select 5,000 paraphrases and 5,000 non-paraphrases as the dev set, and sample another 5,000 paraphrases and 5,000 non-paraphrases as the test set. We keep the remaining instances as the training set 2. 1https://data.quora.com/First-Quora-Dataset-Release-Question Pairs. In this Sub-section, we evaluate our model on the natural language inference task over the SNLI dataset [Bowman et al., 2015]. We experiment on two datasets: TREC-QA [Wang et al., 2007] and Wiki QA [Yang et al., 2015]. |
| Dataset Splits | Yes | We randomly select 5,000 paraphrases and 5,000 non-paraphrases as the dev set, and sample another 5,000 paraphrases and 5,000 non-paraphrases as the test set. We keep the remaining instances as the training set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Glo Ve', 'word2vec', 'LSTM', and 'ADAM optimizer', but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We initialize word embeddings in the word representation layer with the 300-dimensional Glo Ve word vectors... For the charactercomposed embeddings, we initialize each character as a 20-dimensional vector, and compose each word into a 50dimensional vector with a LSTM layer. We set the hidden size as 100 for all Bi LSTM layers. We apply dropout to every layers in Figure 1, and set the dropout ratio as 0.1. To train the model, we minimize the cross entropy of the training set, and use the ADAM optimizer [Kingma and Ba, 2014] to update parameters. We set the learning rate as 0.001. |