Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
Authors: Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu8449-8456
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Commonsense QA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the Commonsense QA dataset. We conduct experiments on the Commonsense QA benchmark dataset. Results show that both the graph-based contextual representation learning module and the graph-based inference module boost the performance. |
| Researcher Affiliation | Collaboration | Shangwen Lv,1,2 Daya Guo,3 Jingjing Xu,4 Duyu Tang,5 Nan Duan,5 Ming Gong,5 Linjun Shou,5 Daxin Jiang,5 Guihong Cao,5 Songlin Hu1,2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3Sun Yat-sen University 4Peking University 5Microsoft Corporation |
| Pseudocode | Yes | Algorithm 1 Topology Sort Algorithm. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing its own source code or providing a link to it for the methodology described. |
| Open Datasets | Yes | This paper utilizes Commonsense QA (Talmor et al. 2019), an influential dataset for commonsense question answering task for experiments. |
| Dataset Splits | Yes | The Commonsense QA (Talmor et al. 2019) dataset contains 12,102 examples, include 9,741 for training, 1,221 for development and 1,140 for test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions software like XLNet, Spacy, and Elastic Search, but it does not specify their version numbers, which is required for reproducibility (e.g., 'Spacy2' refers to a footnote, but no version number is given in the text or the footnote). |
| Experiment Setup | Yes | In our best model on the development dataset, we set the batch size to 4 and learning rate to 5e-6. We set max length of input to 256. We use Adam (Kingma and Ba 2014) with β1 = 0.9, β2 = 0.999 for optimization. We set GCN layer to 1. We train our model for 2,800 steps (about one epoch). |