reproducibilityindex.ai

Difficulty Controllable Generation of Reading Comprehension Questions

Authors: Yifan Gao, Lidong Bing, Wang Chen, Michael Lyu, Irwin King

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For evaluation, we prepared the ﬁrst dataset of reading comprehension questions with difﬁculty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the speciﬁed difﬁculty labels.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2R&D Center Singapore, Machine Intelligence Technology, Alibaba DAMO Academy
Pseudocode	No	The paper describes the model architecture and components in text and diagrams (Figure 2), but no pseudocode blocks are present.
Open Source Code	No	The paper references GitHub links for R-Net and BiDAF (e.g., 'https://github.com/HKUST-Know Comp/R-Net'), but these are for third-party tools used, not the source code for the authors' own method. There is no explicit statement or link for their own code.
Open Datasets	Yes	SQu AD [Rajpurkar et al., 2016] is a reading comprehension dataset containing 100,000+ questions on Wikipedia articles. The answer of each question is a text fragment from the corresponding input passage. We employ SQu AD questions to prepare our experimental dataset.
Dataset Splits	Yes	Our prepared dataset is split according to articles of the SQu AD data, and Table 2 provides the detailed statistics. Across the training, validation and test sets, the splitting ratio is around 7:1:1, and the easy sample ratio is around 58% for all three.
Hardware Specification	No	The paper does not explicitly describe the hardware specifications (e.g., GPU/CPU models, memory, specific cloud instances) used for running its experiments.
Software Dependencies	No	The paper mentions using specific models like R-Net and BiDAF, but it does not provide specific version numbers for any software dependencies, libraries, or programming languages used in their implementation.
Experiment Setup	Yes	The embedding dimensions for the position embedding and the global difﬁculty variable, i.e. dp and dd, are set to 50 and 10 respectively. We use the maximum relative distance L = 20 in the position embedding. We adopt teacher-forcing in the encoder-decoder training and use the ground truth difﬁculty labels. In the testing procedure, we select the model with the lowest perplexity and beam search with size 3 is employed for question generation.