Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari13507-13515

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical results across five commonsense questionanswering tasks with data generated from five external knowledge resources.
Researcher Affiliation Collaboration 1Language Technologies Institute, School of Computer Science, Carnegie Mellon University 2Information Sciences Institute, Viterbi School of Engineering, University of Southern California 3Human-Machine Collaboration, Bosch Research Pittsburgh
Pseudocode No No explicit pseudocode or algorithm block found.
Open Source Code Yes We make our code and resulting datasets available to the community to facilitate future research in this direction.2 2https://github.com/Mayer123/HyKAS-CSKG
Open Datasets Yes We generate questions, answers, and distractor options from five KGs: ATOMIC, Concept Net, Word Net, Visual Genome (Krishna et al. 2017), and Wikidata (Vrandeˇci c and Kr otzsch 2014), found in the Commonsense Knowledge Graph (CSKG) (Ilievski et al. 2020).
Dataset Splits Yes For ATOMIC, this procedure generates 535K QA pairs for training and 60K for development. For CWWV, the training set contains 157K and the dev set has 8K QA pairs.we randomly sample 5% of generated questions as development set, while the other 95% are used for training
Hardware Specification No No specific hardware details (GPU/CPU models, memory) are mentioned in the main text. The paper only mentions “computing infrastructure in the appendix”, but this information is not directly provided in the main body.
Software Dependencies No No specific software dependencies with version numbers are provided in the main text. It mentions using 'Transformers library' but without a version.
Experiment Setup Yes The finetuned LMs are trained for a single epoch on our synthetic QA set. For Adv-filter, we train the models for 5 epochs to compensate for less training data.