Improving Question Generation with Sentence-Level Semantic Matching and Answer Position Inferring

Authors: Xiyao Ma, Qile Zhu, Yanlin Zhou, Xiaolin Li8464-8471

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQu AD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly. We conduct extensive experiments on SQu AD and MS MARCO dataset, demonstrating the superiority of our proposed model compared with existing approaches.
Researcher Affiliation Collaboration Xiyao Ma,1 Qile Zhu,1 Yanlin Zhou,1 Xiaolin Li2 1NSF Center for Big Learning, University of Florida 2AI Institute, Tongdun Technology
Pseudocode No The paper describes methods using equations and textual descriptions, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block or figure.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes Dataset SQu AD V1.1 dataset contains 536 Wikipedia articles and more than 100K questions posed about the articles (Rajpurkar et al. 2016). ... MS MARCO contains more than one million queries along with answers either generated by human or selected from passages (Nguyen et al. 2016).
Dataset Splits Yes Following the baseline (Zhou et al. 2017), we use the training dataset (86635) to train our model, and we split the dev dataset into dev (8965) and test dataset (8964) with a ratio of 50%-50% for evaluation. MS MARCO... We split them into train set (86039), dev set (9480), and test set (7921) for model training and evaluation purpose.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments.
Software Dependencies No The paper mentions tools like 'LSTM', 'Glove vector', and 'Adam Optimizer' but does not specify software versions for these or other key dependencies (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes Followed NQG++ (Zhou et al. 2017), we conduct our experiments on the preprocessed data provided by (Zhou et al. 2017). We use 1 layer LSTM as the RNN cell for both the encoder and the decoder, and a bidirectional LSTM is used for the encoder. The hidden size of the encoder and decoder are 512. We use a 300-dimension pre-trained Glove vector as the word embedding (Pennington, Socher, and Manning 2014). As same as NQG++ (Zhou et al. 2017), the dimensions of lexical features and answer position are 16. We use Adam (Kingma and Ba 2014) Optimizer for model training with an initial learning rate as 0.001, and we halve it when the validation score does not improve. During the training of Sentence-level Semantic Matching module, we sample the negative sentences and questions from nearby data samples in the same batch, due to the preprocessed data (Zhou et al. 2017) lacking of the information about which data samples are from the same passage. We compute our total loss function with α of 1 and β of 2. Models are trained for 20 epochs with mini-batch of size 32. We choose model achieving the best performance on the dev dataset.