reproducibilityindex.ai

Knowledge-Driven Distractor Generation for Cloze-Style Multiple Choice Questions

Authors: Siyu Ren, Kenny Q. Zhu4339-4347

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on a new dataset across four domains show that our framework yields distractors outperforming previous methods both by automatic and human evaluation.
Researcher Affiliation	Academia	Siyu Ren, Kenny Q. Zhu* Shanghai Jiao Tong University Shanghai, China roy0702@sjtu.edu.cn, kzhu@cs.sjtu.edu.cn
Pseudocode	No	The paper describes the framework components and their functionalities but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code at https://github.com/DRSY/DGen.
Open Datasets	Yes	We compile a cross-domain cloze-style MCQ dataset covering science, trivia, vocabulary and common sense, which can be used as a benchmark for future research in DG. We compile and open-source a diverse and comprehensive benchmark dataset for training and evaluating distractor generation model (Section ). Code at https://github.com/DRSY/DGen.
Dataset Splits	Yes	The dataset is randomly divided into train/valid/test with a ratio of 8:1:1.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions software components like NLTK, LDA, Ada Boost, Lambda MART, Word2Vec, ELMo, Re Verb, Stanford Parser, and BERT, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The size of a concept set C is set to be 20. Topic distributions πa,q and γc are obtained using LDA pre-trained on Wikipedia dump and K is set to 100. The dimensionality of feature vector l is 33. Unigram frequency is calculated on Wikipedia dump. For the training of DS, negative examples are sampled using the top 100 candidates extracted by CSG excluding those that are within ground truths. At test time, DS takes as input the top 30 candidates extracted by CSG and 30 candidates sampled from Word Net s own vocabulary having the same POS tag. All hyperparameters are tuned on the dev set.