Text Matching as Image Recognition
Authors: Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate its superiority against the baselines. In this section, we conduct experiments on two tasks, i.e. paraphrase identification and paper citation matching, to demonstrate the superiority of Match Pyramid against baselines. |
| Researcher Affiliation | Academia | CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China {pangliang,wanshengxian}@software.ict.ac.cn, {lanyanyan,guojiafeng,junxu,cxq}@ict.ac.cn |
| Pseudocode | No | The paper describes the convolutional neural network operations and scoring function using mathematical equations (e.g., Eq. 6-10) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to publicly released models for baselines (DSSM/CDSSM) but does not provide any statement or link for the source code of their proposed Match Pyramid model. |
| Open Datasets | Yes | Here we use the benchmark MSRP dataset (Dolan and Brockett 2005), which contains 4076 instances for training and 1725 for testing. The dataset is collected from a commercial academic website. It contains 838 908 instances (text pairs) in total, where there are 279 636 positive (matched) instances and 559 272 negative (mismatch) instances. |
| Dataset Splits | Yes | Here we use the benchmark MSRP dataset (Dolan and Brockett 2005), which contains 4076 instances for training and 1725 for testing. We split the whole dataset into three parts, 599 196 instances for training, 119 829 for validation and 119 883 for testing. |
| Hardware Specification | No | The paper mentions that the optimization 'can be easily parallelized on single machine with multi-cores' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions techniques and libraries like 'Word2Vec', 'Adagrad', and 'Re LU' and cites their original papers, but does not specify any software names with version numbers for implementation details (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | All these models use two convolutional layers, two max-pooling layers (one of which is a dynamic pooling layer for variable length) and two full connection layers. The number of feature maps is 8 and 16 for the first and second convolutional layer, respectively. While the kernel size is set to be 5 5 and 3 3, respectively. We apply stochastic gradient descent method Adagrad (Duchi, Hazan, and Singer 2011) for the optimization of models. It performs better when we use the mini-batch strategy (32 50 in size). |