reproducibilityindex.ai

Duplicate Question Identification by Integrating FrameNet With Neural Networks

Authors: Xiaodong Zhang, Xu Sun, Houfeng Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Quora question pairs dataset demonstrate that the ensemble approach is more effective and outperforms all baselines.
Researcher Affiliation	Academia	1 MOE Key Lab of Computational Linguistics, Peking University, Beijing, 100871, China 2 Collaborative Innovation Center for Language Ability, Xuzhou, Jiangsu, 221009, China
Pseudocode	No	The paper describes models and equations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The code is available at https://github.com/zxdcs/DQI
Open Datasets	Yes	The recently released Quora question pairs (QQP) dataset3 is adopted in our experiments. ... 3https://data.quora.com/First-Quora-Dataset-Release Question-Pairs
Dataset Splits	Yes	Because there is not an ofﬁcial partition of train/dev/test set, we shufﬂe the dataset randomly and split train/dev/test set with a proportion of 8:1:1. The statistics of the question pairs are listed in Table 4.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for its experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper mentions software like 'spaCy', 'Light GBM', and 'PyTorch', but it does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	All hyper-parameters are tuned on the development set. ... The maximum number of leaves in a tree is 700 and minimal number of data in a leaf is 0. The number of boosting round is 10000 and the early stopping round is 100. ... Frame embeddings are 50-dimensional, which are initialized randomly with a uniform distribution between [ 1, 1]. ... The dimension of H is 300, thus the dimension of J is 600. The MLP consists of a 200-dimensional hidden layer. The model is trained using Adam (Kingma and Ba 2014) optimization method with the learning rate set to 0.001. The batch size is set to 100.