Difficulty Controllable Generation of Reading Comprehension Questions

Authors: Yifan Gao, Lidong Bing, Wang Chen, Michael Lyu, Irwin King

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2R&D Center Singapore, Machine Intelligence Technology, Alibaba DAMO Academy
Pseudocode No The paper describes the model architecture and components in text and diagrams (Figure 2), but no pseudocode blocks are present.
Open Source Code No The paper references GitHub links for R-Net and BiDAF (e.g., 'https://github.com/HKUST-Know Comp/R-Net'), but these are for third-party tools used, not the source code for the authors' own method. There is no explicit statement or link for their own code.
Open Datasets Yes SQu AD [Rajpurkar et al., 2016] is a reading comprehension dataset containing 100,000+ questions on Wikipedia articles. The answer of each question is a text fragment from the corresponding input passage. We employ SQu AD questions to prepare our experimental dataset.
Dataset Splits Yes Our prepared dataset is split according to articles of the SQu AD data, and Table 2 provides the detailed statistics. Across the training, validation and test sets, the splitting ratio is around 7:1:1, and the easy sample ratio is around 58% for all three.
Hardware Specification No The paper does not explicitly describe the hardware specifications (e.g., GPU/CPU models, memory, specific cloud instances) used for running its experiments.
Software Dependencies No The paper mentions using specific models like R-Net and BiDAF, but it does not provide specific version numbers for any software dependencies, libraries, or programming languages used in their implementation.
Experiment Setup Yes The embedding dimensions for the position embedding and the global difficulty variable, i.e. dp and dd, are set to 50 and 10 respectively. We use the maximum relative distance L = 20 in the position embedding. We adopt teacher-forcing in the encoder-decoder training and use the ground truth difficulty labels. In the testing procedure, we select the model with the lowest perplexity and beam search with size 3 is employed for question generation.