reproducibilityindex.ai

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari13507-13515

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical results across ﬁve commonsense questionanswering tasks with data generated from ﬁve external knowledge resources.
Researcher Affiliation	Collaboration	1Language Technologies Institute, School of Computer Science, Carnegie Mellon University 2Information Sciences Institute, Viterbi School of Engineering, University of Southern California 3Human-Machine Collaboration, Bosch Research Pittsburgh
Pseudocode	No	No explicit pseudocode or algorithm block found.
Open Source Code	Yes	We make our code and resulting datasets available to the community to facilitate future research in this direction.2 2https://github.com/Mayer123/HyKAS-CSKG
Open Datasets	Yes	We generate questions, answers, and distractor options from ﬁve KGs: ATOMIC, Concept Net, Word Net, Visual Genome (Krishna et al. 2017), and Wikidata (Vrandeˇci c and Kr otzsch 2014), found in the Commonsense Knowledge Graph (CSKG) (Ilievski et al. 2020).
Dataset Splits	Yes	For ATOMIC, this procedure generates 535K QA pairs for training and 60K for development. For CWWV, the training set contains 157K and the dev set has 8K QA pairs.we randomly sample 5% of generated questions as development set, while the other 95% are used for training
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory) are mentioned in the main text. The paper only mentions “computing infrastructure in the appendix”, but this information is not directly provided in the main body.
Software Dependencies	No	No specific software dependencies with version numbers are provided in the main text. It mentions using 'Transformers library' but without a version.
Experiment Setup	Yes	The ﬁnetuned LMs are trained for a single epoch on our synthetic QA set. For Adv-ﬁlter, we train the models for 5 epochs to compensate for less training data.