A Multi-View Fusion Neural Network for Answer Selection

Authors: Lei Sha, Xiaodong Zhang, Feng Qian, Baobao Chang, Zhifang Sui

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Wiki QA and Sem Eval-2016 CQA datasets demonstrate that our proposed model outperforms the state-of-the-art methods.
Researcher Affiliation Academia Lei Sha, Xiaodong Zhang, Feng Qian, Baobao Chang, Zhifang Sui Contributed equally Key Laboratory of Computational Linguistics, Ministry of Education School of Electronics Engineering and Computer Science, Peking University {shalei, zxdcs, nickqian, chbb, szf}@pku.edu.cn
Pseudocode No The paper describes the model's architecture and calculations using mathematical equations and diagrams, but it does not include a distinct pseudocode block or algorithm.
Open Source Code No The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We report the performance of our proposed method on two datasets: Wiki QA (Yang, Yih, and Meek 2015) and Sem Eval-2016 CQA (Nakov et al. 2016).
Dataset Splits Yes Table 2: The statistics of three answer selection datasets. For Wiki QA, we remove all the questions that has no right answers. Dataset (Train / Dev / Test) Wiki QA: 873 / 126 / 243 # of questions, 20360 / 2733 / 6165 # of answers. Sem Eval-2016 CQA: 4879 / 244 / 327 # of questions, 36198 / 2440 / 3270 # of answers.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions "pre-trained Glo Ve" and "Stanford Corenlp" but does not specify version numbers for these or any other software dependencies, which are required for reproducibility.
Experiment Setup Yes We use 100-dim word embeddings (d = 100) and we set the hidden layer length dh = 500. The external memory length d M is set to 400. The margin is set to 0.1. To compute the network parameter θ, we maximize the max-margin likelihood J(θ) through stochastic gradient descent over shuffled mini-batches with the Adadelta (Zeiler 2012) update rule.