S-Net: From Answer Extraction to Answer Synthesis for Machine Reading Comprehension

Authors: Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, Ming Zhou

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our extraction-then-synthesis method outperforms state-of-the-art methods. We conduct experiments on the MS-MARCO dataset. The results show our extraction-then-synthesis framework outperforms our baselines and all other existing methods in terms of ROUGE-L and BLEU-1.
Researcher Affiliation Collaboration State Key Laboratory of Software Development Environment, Beihang University, Beijing, China Microsoft Research, Beijing, China
Pseudocode No The paper describes its methods using mathematical equations and textual explanations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide any specific links to open-source code or explicitly state that the code will be made publicly available.
Open Datasets Yes We conduct our experiments on the MS-MARCO dataset (Nguyen et al. 2016).
Dataset Splits Yes The data has been split into a training set (82,326 pairs), a development set (10,047 pairs) and a test set (9,650 pairs).
Hardware Specification No The paper does not specify the hardware used to run the experiments (e.g., CPU, GPU models, or cloud computing instances).
Software Dependencies No The paper mentions using GloVe embeddings and general neural network components (GRU, Bi-GRU) but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes For answer extraction, we use 300dimensional uncased pre-trained Glo Ve embeddings... Hidden vector length is set to 150 for all layers. We also apply dropout (Srivastava et al. 2014) between layers, with dropout rate 0.1. The weight r is set to 0.8. For answer synthesis, we use an identical vocabulary set for the input and output collected from the training data. We set the vocabulary size to 30,000... All word embeddings are updated during the training. We set the word embedding size to 300, set the feature embedding size of start and end positions of the extracted snippet to 50, and set all GRU hidden state sizes to 150. The model is optimized using Ada Delta (Zeiler 2012) with initial learning rate of 1.0.