Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation
Authors: Ning Bian, Xianpei Han, Bo Chen, Le Sun12574-12582
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To answer these questions, we benchmark knowledge-enhanced CQA by conducting extensive experiments on multiple standard CQA datasets using a simple and effective knowledge-to-text transformation framework. |
| Researcher Affiliation | Academia | Ning Bian,1,3 Xianpei Han,1,2,* Bo Chen,1,2 Le Sun1,2,* 1Chinese Information Processing Laboratory, 2State Key Laboratory of Computer Science Institute of Software, Chinese Academy of Sciences, Beijing, China 3University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements or links for open-source code availability for the described methodology. |
| Open Datasets | Yes | We use Commonsense QA dataset v1.11 (Talmor et al. 2019) as the primary dataset, and adopt the Winograd Schema Challenge (WSC, Levesque, Davis, and Morgenstern 2012), Hella SWAG (Zellers et al. 2019), and SOCIAL IQa (Sap et al. 2019b) as secondary datasets. ... We use Concept Net 5 (Speer, Chin, and Havasi 2017) as the KB for benchmarking... |
| Dataset Splits | No | The paper mentions 'CQA training data' and 'dev set' but does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using specific pretrained language models (e.g., BERT-Large, RoBERTa-Large) but does not provide specific version numbers for underlying software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA. |
| Experiment Setup | Yes | For knowledge retrieval, we use knowledge paths within 2 hops (K = 2). In paraphrasing-based transformation, we use the top 1 paraphrasing result (M = 1). For MRC models, we initialize them with the official pretrained language models (BERT-Large, Ro BERTa-Large, XLNet-Large, and ALBERT-XXLarge) and fine-tune them using CQA training data. The output layers have a 1024-dimensional hidden layer with a tanh activation function. All models are trained using Adam with a learning rate of 5e-6. |