Duplicate Question Identification by Integrating FrameNet With Neural Networks

Authors: Xiaodong Zhang, Xu Sun, Houfeng Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Quora question pairs dataset demonstrate that the ensemble approach is more effective and outperforms all baselines.
Researcher Affiliation Academia 1 MOE Key Lab of Computational Linguistics, Peking University, Beijing, 100871, China 2 Collaborative Innovation Center for Language Ability, Xuzhou, Jiangsu, 221009, China
Pseudocode No The paper describes models and equations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 1The code is available at https://github.com/zxdcs/DQI
Open Datasets Yes The recently released Quora question pairs (QQP) dataset3 is adopted in our experiments. ... 3https://data.quora.com/First-Quora-Dataset-Release Question-Pairs
Dataset Splits Yes Because there is not an official partition of train/dev/test set, we shuffle the dataset randomly and split train/dev/test set with a proportion of 8:1:1. The statistics of the question pairs are listed in Table 4.
Hardware Specification No The paper does not explicitly describe the specific hardware used for its experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions software like 'spaCy', 'Light GBM', and 'PyTorch', but it does not provide specific version numbers for these dependencies.
Experiment Setup Yes All hyper-parameters are tuned on the development set. ... The maximum number of leaves in a tree is 700 and minimal number of data in a leaf is 0. The number of boosting round is 10000 and the early stopping round is 100. ... Frame embeddings are 50-dimensional, which are initialized randomly with a uniform distribution between [ 1, 1]. ... The dimension of H is 300, thus the dimension of J is 600. The MLP consists of a 200-dimensional hidden layer. The model is trained using Adam (Kingma and Ba 2014) optimization method with the learning rate set to 0.001. The batch size is set to 100.