reproducibilityindex.ai

Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation

Authors: Ning Bian, Xianpei Han, Bo Chen, Le Sun12574-12582

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To answer these questions, we benchmark knowledge-enhanced CQA by conducting extensive experiments on multiple standard CQA datasets using a simple and effective knowledge-to-text transformation framework.
Researcher Affiliation	Academia	Ning Bian,1,3 Xianpei Han,1,2,* Bo Chen,1,2 Le Sun1,2,* 1Chinese Information Processing Laboratory, 2State Key Laboratory of Computer Science Institute of Software, Chinese Academy of Sciences, Beijing, China 3University of Chinese Academy of Sciences, Beijing, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide explicit statements or links for open-source code availability for the described methodology.
Open Datasets	Yes	We use Commonsense QA dataset v1.11 (Talmor et al. 2019) as the primary dataset, and adopt the Winograd Schema Challenge (WSC, Levesque, Davis, and Morgenstern 2012), Hella SWAG (Zellers et al. 2019), and SOCIAL IQa (Sap et al. 2019b) as secondary datasets. ... We use Concept Net 5 (Speer, Chin, and Havasi 2017) as the KB for benchmarking...
Dataset Splits	No	The paper mentions 'CQA training data' and 'dev set' but does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using specific pretrained language models (e.g., BERT-Large, RoBERTa-Large) but does not provide specific version numbers for underlying software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA.
Experiment Setup	Yes	For knowledge retrieval, we use knowledge paths within 2 hops (K = 2). In paraphrasing-based transformation, we use the top 1 paraphrasing result (M = 1). For MRC models, we initialize them with the ofﬁcial pretrained language models (BERT-Large, Ro BERTa-Large, XLNet-Large, and ALBERT-XXLarge) and ﬁne-tune them using CQA training data. The output layers have a 1024-dimensional hidden layer with a tanh activation function. All models are trained using Adam with a learning rate of 5e-6.