Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
Authors: Yu Chen, Lingfei Wu, Mohammed J. Zaki
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model is end-to-end trainable and achieves new state-of-the-art scores, outperforming existing methods by a significant margin on the standard SQu AD benchmark. We evaluate our proposed model against state-of-the-art methods on the SQu AD dataset (Rajpurkar et al., 2016). Our full models have two variants G2Ssta+BERT+RL and G2Sdyn+BERT+RL which adopts static graph construction or dynamic graph construction, respectively. For model settings and sensitivity analysis, please refer to Appendix B and C. Table 1 shows the automatic evaluation results comparing our proposed models against other stateof-the-art baseline methods. Table 3: Ablation study on the SQu AD split-2 test set. |
| Researcher Affiliation | Collaboration | Yu Chen Department of Computer Science Rensselaer Polytechnic Institute cheny39@rpi.edu Lingfei Wu IBM Research lwu@email.wm.edu Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute zaki@cs.rpi.edu |
| Pseudocode | No | The paper describes the model architecture and equations in detail, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The implementation of our model is publicly available at https://github.com/hugochan/RL-based-Graph2Seq-for-NQG. |
| Open Datasets | Yes | We evaluate our proposed model against state-of-the-art methods on the SQu AD dataset (Rajpurkar et al., 2016). SQu AD contains more than 100K questions posed by crowd workers on 536 Wikipedia articles. |
| Dataset Splits | Yes | For fair comparison with previous methods, we evaluated our model on both data split-1 (Song et al., 2018a)1 that contains 75,500/17,934/11,805 (train/development/test) examples and data split-2 (Zhou et al., 2017) 2 that contains 86,635/8,965/8,964 examples. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only describes the software and training configurations. |
| Software Dependencies | No | The paper mentions using GloVe embeddings, BERT embeddings, and the Open NMT library, and Adam as an optimizer, but it does not specify any version numbers for these software components or libraries, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We keep and fix the 300-dim Glo Ve vectors for the most frequent 70,000 words in the training set. We compute the 1024-dim BERT embeddings on the fly for each word in text using a (trainable) weighted sum of all BERT layer outputs. The embedding sizes of case, POS and NER tags are set to 3, 12 and 8, respectively. We set the hidden state size of Bi LSTM to 150 so that the concatenated state size for both directions is 300. The size of all other hidden layers is set to 300. We apply a variational dropout (Kingma et al., 2015) rate of 0.4 after word embedding layers and 0.3 after RNN layers. We set the neighborhood size to 10 for dynamic graph construction. The number of GNN hops is set to 3. During training, in each epoch, we set the initial teacher forcing probability to 0.75 and exponentially increase it to 0.75 0.9999i where i is the training step. We set α in the reward function to 0.1, γ in the mixed loss function to 0.99, and the coverage loss ratio λ to 0.4. We use Adam (Kingma & Ba, 2014) as the optimizer, and the learning rate is set to 0.001 in the pretraining stage and 0.00001 in the fine-tuning stage. We reduce the learning rate by a factor of 0.5 if the validation BLEU-4 score stops improving for three epochs. We stop the training when no improvement is seen for 10 epochs. We clip the gradient at length 10. The batch size is set to 60 and 50 on data split-1 and split-2, respectively. The beam search width is set to 5. All hyperparameters are tuned on the development set. |