Convolutional Neural Network Architectures for Matching Natural Language Sentences
Authors: Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical study on a variety of matching tasks demonstrates the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models. 5 Experiments We report the performance of the proposed models on three matching tasks of different nature, and compare it with that of other competitor models. |
| Researcher Affiliation | Collaboration | Department of Computer Science & Technology, Harbin Institute of Technology Shenzhen Graduate School, Xili, China baotianchina@gmail.com qingcai.chen@hitsz.edu.cn Noah s Ark Lab Huawei Technologies Co. Ltd. Sha Tin, Hong Kong lu.zhengdong@huawei.com hangli.hl@huawei.com |
| Pseudocode | No | The paper describes the proposed architectures with figures and mathematical equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our project page: http://www.noahlab.com.hk/technology/Learning2Match.html (This is a project page, not an explicit code repository for the described methodology.) |
| Open Datasets | Yes | We use 50-dimensional word embedding trained with the Word2Vec [14]: the embedding for English words (Section 5.2 & 5.4) is learnt on Wikipedia ( 1B words), while that for Chinese words (Section 5.3) is learnt on Weibo data ( 300M words). Basically, we take a sentence from Reuters [12] with two balanced clauses... We trained our model with 4.5 million original (tweet, response) pairs collected from Weibo, a major Chinese microblog service [26]. Here we use the benchmark MSRP dataset [17] |
| Dataset Splits | No | For regularization, we find that for both architectures, early stopping [16] is enough for models with medium size and large training sets (with over 500K instances). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions tools and techniques like Word2Vec [14], ReLU [7], and dropout [8], but does not provide specific version numbers for software dependencies required for replication. |
| Experiment Setup | Yes | We use 50-dimensional word embedding trained with the Word2Vec [14], We use 3-word window throughout all experiments, but test various numbers of feature maps (typically from 200 to 500), for optimal performance. ARC-II models for all tasks have eight layers (three for convolution, three for pooling, and two for MLP), while ARC-I performs better with less layers (two for convolution, two for pooling, and two for MLP) and more hidden nodes. We use Re Lu [7] as the activation function for all of models (convolution and MLP), which yields comparable or better results to sigmoid-like functions, but converges faster. We use stochastic gradient descent for the optimization of models. All the proposed models perform better with mini-batch (100 200 in sizes) |