reproducibilityindex.ai

Neural Question Generation with Answer Pivot

Authors: Bingning Wang, Xiaochuan Wang, Ting Tao, Qi Zhang, Jingfang Xu9138-9145

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct throughout experiments on SQu AD (Rajpurkar et al. 2016). The proposed model consistently outperforms the pure answer-aware or answer-agnostic counterparts in terms of the automatic evaluation metric. The human assessment demonstrates that our proposed model could generate both answerable and diverse questions.
Researcher Affiliation	Industry	Xiaochuan Wang, Bingning Wang, Ting Yao, Qi Zhang, Jingfang Xu Sogou Inc. Beijing, 100084, China {wxc, wangbingning, yaoting, qizhang, xujingfang}@sogou-inc.com
Pseudocode	No	The paper describes its methodology in text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Codes and analysis of this paper will be public available.
Open Datasets	Yes	In this paper, we conduct the experiments on the SQu AD dataset that has been widely used for NQG evaluation. The SQu AD dataset consists of 23,215 paragraphs from 536 articles in Wikipedia, with nearly 100,000 crowd-sourced question-answer pairs.
Dataset Splits	Yes	Since the test sets are not publicly available, we follow Zhou et al. (2017) to randomly split the dev set into two parts and use them as the development set and test set for NQG. Thus, the total number of training, developing and testing set is 86,635, 8,965 and 8,964 respectively.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running experiments.
Software Dependencies	No	In this paper, we preprocess the text with the sentencepiece (Kudo and Richardson 2018) tokenizer with vocabulary size 30,000. We initialize the word embedding with the skip-gram algorithm1. We use the Adam (Kingma and Ba 2014) optimizer with default hyperparameters to optimize the models. The paper mentions software tools like 'sentencepiece' and 'Adam optimizer' but does not provide specific version numbers for them or any other key software dependencies.
Experiment Setup	Yes	We truncate the paragraph to max sequence size of 256, and question to max sequence length of 30. For the encoder and decoder, we set the number of layers to 4, and the number of head to 6, hidden size is set to 384. Dropout is applied to the output of word embedding layer and the multi-head attention layer with rate 0.2. We use the Adam (Kingma and Ba 2014) optimizer with default hyperparameters to optimize the models. During inference, we adopt a top-5 beam search with length penalty of 0.9. For the answer pivot weight λ in Eq. 11, we ﬁrst optimize the MLE loss and then gradually increasing it from 0 to 0.5, which is tuned in the development set.