Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation

Authors: Ning Bian, Xianpei Han, Bo Chen, Le Sun12574-12582

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To answer these questions, we benchmark knowledge-enhanced CQA by conducting extensive experiments on multiple standard CQA datasets using a simple and effective knowledge-to-text transformation framework.
Researcher Affiliation Academia Ning Bian,1,3 Xianpei Han,1,2,* Bo Chen,1,2 Le Sun1,2,* 1Chinese Information Processing Laboratory, 2State Key Laboratory of Computer Science Institute of Software, Chinese Academy of Sciences, Beijing, China 3University of Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit statements or links for open-source code availability for the described methodology.
Open Datasets Yes We use Commonsense QA dataset v1.11 (Talmor et al. 2019) as the primary dataset, and adopt the Winograd Schema Challenge (WSC, Levesque, Davis, and Morgenstern 2012), Hella SWAG (Zellers et al. 2019), and SOCIAL IQa (Sap et al. 2019b) as secondary datasets. ... We use Concept Net 5 (Speer, Chin, and Havasi 2017) as the KB for benchmarking...
Dataset Splits No The paper mentions 'CQA training data' and 'dev set' but does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using specific pretrained language models (e.g., BERT-Large, RoBERTa-Large) but does not provide specific version numbers for underlying software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA.
Experiment Setup Yes For knowledge retrieval, we use knowledge paths within 2 hops (K = 2). In paraphrasing-based transformation, we use the top 1 paraphrasing result (M = 1). For MRC models, we initialize them with the official pretrained language models (BERT-Large, Ro BERTa-Large, XLNet-Large, and ALBERT-XXLarge) and fine-tune them using CQA training data. The output layers have a 1024-dimensional hidden layer with a tanh activation function. All models are trained using Adam with a learning rate of 5e-6.