reproducibilityindex.ai

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

Authors: Yu Chen, Lingfei Wu, Mohammed J. Zaki

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model is end-to-end trainable and achieves new state-of-the-art scores, outperforming existing methods by a signiﬁcant margin on the standard SQu AD benchmark. We evaluate our proposed model against state-of-the-art methods on the SQu AD dataset (Rajpurkar et al., 2016). Our full models have two variants G2Ssta+BERT+RL and G2Sdyn+BERT+RL which adopts static graph construction or dynamic graph construction, respectively. For model settings and sensitivity analysis, please refer to Appendix B and C. Table 1 shows the automatic evaluation results comparing our proposed models against other stateof-the-art baseline methods. Table 3: Ablation study on the SQu AD split-2 test set.
Researcher Affiliation	Collaboration	Yu Chen Department of Computer Science Rensselaer Polytechnic Institute cheny39@rpi.edu Lingfei Wu IBM Research lwu@email.wm.edu Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute zaki@cs.rpi.edu
Pseudocode	No	The paper describes the model architecture and equations in detail, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The implementation of our model is publicly available at https://github.com/hugochan/RL-based-Graph2Seq-for-NQG.
Open Datasets	Yes	We evaluate our proposed model against state-of-the-art methods on the SQu AD dataset (Rajpurkar et al., 2016). SQu AD contains more than 100K questions posed by crowd workers on 536 Wikipedia articles.
Dataset Splits	Yes	For fair comparison with previous methods, we evaluated our model on both data split-1 (Song et al., 2018a)1 that contains 75,500/17,934/11,805 (train/development/test) examples and data split-2 (Zhou et al., 2017) 2 that contains 86,635/8,965/8,964 examples.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only describes the software and training configurations.
Software Dependencies	No	The paper mentions using GloVe embeddings, BERT embeddings, and the Open NMT library, and Adam as an optimizer, but it does not specify any version numbers for these software components or libraries, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	We keep and ﬁx the 300-dim Glo Ve vectors for the most frequent 70,000 words in the training set. We compute the 1024-dim BERT embeddings on the ﬂy for each word in text using a (trainable) weighted sum of all BERT layer outputs. The embedding sizes of case, POS and NER tags are set to 3, 12 and 8, respectively. We set the hidden state size of Bi LSTM to 150 so that the concatenated state size for both directions is 300. The size of all other hidden layers is set to 300. We apply a variational dropout (Kingma et al., 2015) rate of 0.4 after word embedding layers and 0.3 after RNN layers. We set the neighborhood size to 10 for dynamic graph construction. The number of GNN hops is set to 3. During training, in each epoch, we set the initial teacher forcing probability to 0.75 and exponentially increase it to 0.75 0.9999i where i is the training step. We set α in the reward function to 0.1, γ in the mixed loss function to 0.99, and the coverage loss ratio λ to 0.4. We use Adam (Kingma & Ba, 2014) as the optimizer, and the learning rate is set to 0.001 in the pretraining stage and 0.00001 in the ﬁne-tuning stage. We reduce the learning rate by a factor of 0.5 if the validation BLEU-4 score stops improving for three epochs. We stop the training when no improvement is seen for 10 epochs. We clip the gradient at length 10. The batch size is set to 60 and 50 on data split-1 and split-2, respectively. The beam search width is set to 5. All hyperparameters are tuned on the development set.