Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering

Authors: Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu8449-8456

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on Commonsense QA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the Commonsense QA dataset. We conduct experiments on the Commonsense QA benchmark dataset. Results show that both the graph-based contextual representation learning module and the graph-based inference module boost the performance.
Researcher Affiliation Collaboration Shangwen Lv,1,2 Daya Guo,3 Jingjing Xu,4 Duyu Tang,5 Nan Duan,5 Ming Gong,5 Linjun Shou,5 Daxin Jiang,5 Guihong Cao,5 Songlin Hu1,2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3Sun Yat-sen University 4Peking University 5Microsoft Corporation
Pseudocode Yes Algorithm 1 Topology Sort Algorithm.
Open Source Code No The paper does not contain any explicit statement about releasing its own source code or providing a link to it for the methodology described.
Open Datasets Yes This paper utilizes Commonsense QA (Talmor et al. 2019), an influential dataset for commonsense question answering task for experiments.
Dataset Splits Yes The Commonsense QA (Talmor et al. 2019) dataset contains 12,102 examples, include 9,741 for training, 1,221 for development and 1,140 for test.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions software like XLNet, Spacy, and Elastic Search, but it does not specify their version numbers, which is required for reproducibility (e.g., 'Spacy2' refers to a footnote, but no version number is given in the text or the footnote).
Experiment Setup Yes In our best model on the development dataset, we set the batch size to 4 and learning rate to 5e-6. We set max length of input to 256. We use Adam (Kingma and Ba 2014) with β1 = 0.9, β2 = 0.999 for optimization. We set GCN layer to 1. We train our model for 2,800 steps (about one epoch).