Bilateral Multi-Perspective Matching for Natural Language Sentences

Authors: Zhiguo Wang, Wael Hamza, Radu Florian

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.
Researcher Affiliation Industry Zhiguo Wang, Wael Hamza, Radu Florian IBM T.J. Watson Research Center {zhigwang,whamza,raduf}@us.ibm.com
Pseudocode No The paper describes the model architecture and mathematical formulas but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes We will release our source code and the dataset partition at https://zhiguowang.github.io/ .
Open Datasets Yes We choose the paraphrase identification task, and experiment on the Quora Question Pairs dataset 1. This dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other. We randomly select 5,000 paraphrases and 5,000 non-paraphrases as the dev set, and sample another 5,000 paraphrases and 5,000 non-paraphrases as the test set. We keep the remaining instances as the training set 2. 1https://data.quora.com/First-Quora-Dataset-Release-Question Pairs. In this Sub-section, we evaluate our model on the natural language inference task over the SNLI dataset [Bowman et al., 2015]. We experiment on two datasets: TREC-QA [Wang et al., 2007] and Wiki QA [Yang et al., 2015].
Dataset Splits Yes We randomly select 5,000 paraphrases and 5,000 non-paraphrases as the dev set, and sample another 5,000 paraphrases and 5,000 non-paraphrases as the test set. We keep the remaining instances as the training set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running the experiments.
Software Dependencies No The paper mentions software components like 'Glo Ve', 'word2vec', 'LSTM', and 'ADAM optimizer', but it does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We initialize word embeddings in the word representation layer with the 300-dimensional Glo Ve word vectors... For the charactercomposed embeddings, we initialize each character as a 20-dimensional vector, and compose each word into a 50dimensional vector with a LSTM layer. We set the hidden size as 100 for all Bi LSTM layers. We apply dropout to every layers in Figure 1, and set the dropout ratio as 0.1. To train the model, we minimize the cross entropy of the training set, and use the ADAM optimizer [Kingma and Ba, 2014] to update parameters. We set the learning rate as 0.001.