Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering
Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari13507-13515
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical results across five commonsense questionanswering tasks with data generated from five external knowledge resources. |
| Researcher Affiliation | Collaboration | 1Language Technologies Institute, School of Computer Science, Carnegie Mellon University 2Information Sciences Institute, Viterbi School of Engineering, University of Southern California 3Human-Machine Collaboration, Bosch Research Pittsburgh |
| Pseudocode | No | No explicit pseudocode or algorithm block found. |
| Open Source Code | Yes | We make our code and resulting datasets available to the community to facilitate future research in this direction.2 2https://github.com/Mayer123/HyKAS-CSKG |
| Open Datasets | Yes | We generate questions, answers, and distractor options from five KGs: ATOMIC, Concept Net, Word Net, Visual Genome (Krishna et al. 2017), and Wikidata (Vrandeˇci c and Kr otzsch 2014), found in the Commonsense Knowledge Graph (CSKG) (Ilievski et al. 2020). |
| Dataset Splits | Yes | For ATOMIC, this procedure generates 535K QA pairs for training and 60K for development. For CWWV, the training set contains 157K and the dev set has 8K QA pairs.we randomly sample 5% of generated questions as development set, while the other 95% are used for training |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory) are mentioned in the main text. The paper only mentions “computing infrastructure in the appendix”, but this information is not directly provided in the main body. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided in the main text. It mentions using 'Transformers library' but without a version. |
| Experiment Setup | Yes | The finetuned LMs are trained for a single epoch on our synthetic QA set. For Adv-filter, we train the models for 5 epochs to compensate for less training data. |