Knowledge-Driven Distractor Generation for Cloze-Style Multiple Choice Questions
Authors: Siyu Ren, Kenny Q. Zhu4339-4347
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on a new dataset across four domains show that our framework yields distractors outperforming previous methods both by automatic and human evaluation. |
| Researcher Affiliation | Academia | Siyu Ren, Kenny Q. Zhu* Shanghai Jiao Tong University Shanghai, China roy0702@sjtu.edu.cn, kzhu@cs.sjtu.edu.cn |
| Pseudocode | No | The paper describes the framework components and their functionalities but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code at https://github.com/DRSY/DGen. |
| Open Datasets | Yes | We compile a cross-domain cloze-style MCQ dataset covering science, trivia, vocabulary and common sense, which can be used as a benchmark for future research in DG. We compile and open-source a diverse and comprehensive benchmark dataset for training and evaluating distractor generation model (Section ). Code at https://github.com/DRSY/DGen. |
| Dataset Splits | Yes | The dataset is randomly divided into train/valid/test with a ratio of 8:1:1. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions software components like NLTK, LDA, Ada Boost, Lambda MART, Word2Vec, ELMo, Re Verb, Stanford Parser, and BERT, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The size of a concept set C is set to be 20. Topic distributions πa,q and γc are obtained using LDA pre-trained on Wikipedia dump and K is set to 100. The dimensionality of feature vector l is 33. Unigram frequency is calculated on Wikipedia dump. For the training of DS, negative examples are sampled using the top 100 candidates extracted by CSG excluding those that are within ground truths. At test time, DS takes as input the top 30 candidates extracted by CSG and 30 candidates sampled from Word Net s own vocabulary having the same POS tag. All hyperparameters are tuned on the dev set. |