Knowledge-Driven Distractor Generation for Cloze-Style Multiple Choice Questions

Authors: Siyu Ren, Kenny Q. Zhu4339-4347

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on a new dataset across four domains show that our framework yields distractors outperforming previous methods both by automatic and human evaluation.
Researcher Affiliation Academia Siyu Ren, Kenny Q. Zhu* Shanghai Jiao Tong University Shanghai, China roy0702@sjtu.edu.cn, kzhu@cs.sjtu.edu.cn
Pseudocode No The paper describes the framework components and their functionalities but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code at https://github.com/DRSY/DGen.
Open Datasets Yes We compile a cross-domain cloze-style MCQ dataset covering science, trivia, vocabulary and common sense, which can be used as a benchmark for future research in DG. We compile and open-source a diverse and comprehensive benchmark dataset for training and evaluating distractor generation model (Section ). Code at https://github.com/DRSY/DGen.
Dataset Splits Yes The dataset is randomly divided into train/valid/test with a ratio of 8:1:1.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions software components like NLTK, LDA, Ada Boost, Lambda MART, Word2Vec, ELMo, Re Verb, Stanford Parser, and BERT, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes The size of a concept set C is set to be 20. Topic distributions πa,q and γc are obtained using LDA pre-trained on Wikipedia dump and K is set to 100. The dimensionality of feature vector l is 33. Unigram frequency is calculated on Wikipedia dump. For the training of DS, negative examples are sampled using the top 100 candidates extracted by CSG excluding those that are within ground truths. At test time, DS takes as input the top 30 candidates extracted by CSG and 30 candidates sampled from Word Net s own vocabulary having the same POS tag. All hyperparameters are tuned on the dev set.