Duplicate Question Identification by Integrating FrameNet With Neural Networks
Authors: Xiaodong Zhang, Xu Sun, Houfeng Wang
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Quora question pairs dataset demonstrate that the ensemble approach is more effective and outperforms all baselines. |
| Researcher Affiliation | Academia | 1 MOE Key Lab of Computational Linguistics, Peking University, Beijing, 100871, China 2 Collaborative Innovation Center for Language Ability, Xuzhou, Jiangsu, 221009, China |
| Pseudocode | No | The paper describes models and equations but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is available at https://github.com/zxdcs/DQI |
| Open Datasets | Yes | The recently released Quora question pairs (QQP) dataset3 is adopted in our experiments. ... 3https://data.quora.com/First-Quora-Dataset-Release Question-Pairs |
| Dataset Splits | Yes | Because there is not an official partition of train/dev/test set, we shuffle the dataset randomly and split train/dev/test set with a proportion of 8:1:1. The statistics of the question pairs are listed in Table 4. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for its experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions software like 'spaCy', 'Light GBM', and 'PyTorch', but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | All hyper-parameters are tuned on the development set. ... The maximum number of leaves in a tree is 700 and minimal number of data in a leaf is 0. The number of boosting round is 10000 and the early stopping round is 100. ... Frame embeddings are 50-dimensional, which are initialized randomly with a uniform distribution between [ 1, 1]. ... The dimension of H is 300, thus the dimension of J is 600. The MLP consists of a 200-dimensional hidden layer. The model is trained using Adam (Kingma and Ba 2014) optimization method with the learning rate set to 0.001. The batch size is set to 100. |