Convolutional Neural Network Architectures for Matching Natural Language Sentences

Authors: Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical study on a variety of matching tasks demonstrates the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models. 5 Experiments We report the performance of the proposed models on three matching tasks of different nature, and compare it with that of other competitor models.
Researcher Affiliation Collaboration Department of Computer Science & Technology, Harbin Institute of Technology Shenzhen Graduate School, Xili, China baotianchina@gmail.com qingcai.chen@hitsz.edu.cn Noah s Ark Lab Huawei Technologies Co. Ltd. Sha Tin, Hong Kong lu.zhengdong@huawei.com hangli.hl@huawei.com
Pseudocode No The paper describes the proposed architectures with figures and mathematical equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No Our project page: http://www.noahlab.com.hk/technology/Learning2Match.html (This is a project page, not an explicit code repository for the described methodology.)
Open Datasets Yes We use 50-dimensional word embedding trained with the Word2Vec [14]: the embedding for English words (Section 5.2 & 5.4) is learnt on Wikipedia ( 1B words), while that for Chinese words (Section 5.3) is learnt on Weibo data ( 300M words). Basically, we take a sentence from Reuters [12] with two balanced clauses... We trained our model with 4.5 million original (tweet, response) pairs collected from Weibo, a major Chinese microblog service [26]. Here we use the benchmark MSRP dataset [17]
Dataset Splits No For regularization, we find that for both architectures, early stopping [16] is enough for models with medium size and large training sets (with over 500K instances).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions tools and techniques like Word2Vec [14], ReLU [7], and dropout [8], but does not provide specific version numbers for software dependencies required for replication.
Experiment Setup Yes We use 50-dimensional word embedding trained with the Word2Vec [14], We use 3-word window throughout all experiments, but test various numbers of feature maps (typically from 200 to 500), for optimal performance. ARC-II models for all tasks have eight layers (three for convolution, three for pooling, and two for MLP), while ARC-I performs better with less layers (two for convolution, two for pooling, and two for MLP) and more hidden nodes. We use Re Lu [7] as the activation function for all of models (convolution and MLP), which yields comparable or better results to sigmoid-like functions, but converges faster. We use stochastic gradient descent for the optimization of models. All the proposed models perform better with mini-batch (100 200 in sizes)