reproducibilityindex.ai

Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering

Authors: Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu8449-8456

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Commonsense QA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the Commonsense QA dataset. We conduct experiments on the Commonsense QA benchmark dataset. Results show that both the graph-based contextual representation learning module and the graph-based inference module boost the performance.
Researcher Affiliation	Collaboration	Shangwen Lv,1,2 Daya Guo,3 Jingjing Xu,4 Duyu Tang,5 Nan Duan,5 Ming Gong,5 Linjun Shou,5 Daxin Jiang,5 Guihong Cao,5 Songlin Hu1,2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3Sun Yat-sen University 4Peking University 5Microsoft Corporation
Pseudocode	Yes	Algorithm 1 Topology Sort Algorithm.
Open Source Code	No	The paper does not contain any explicit statement about releasing its own source code or providing a link to it for the methodology described.
Open Datasets	Yes	This paper utilizes Commonsense QA (Talmor et al. 2019), an inﬂuential dataset for commonsense question answering task for experiments.
Dataset Splits	Yes	The Commonsense QA (Talmor et al. 2019) dataset contains 12,102 examples, include 9,741 for training, 1,221 for development and 1,140 for test.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions software like XLNet, Spacy, and Elastic Search, but it does not specify their version numbers, which is required for reproducibility (e.g., 'Spacy2' refers to a footnote, but no version number is given in the text or the footnote).
Experiment Setup	Yes	In our best model on the development dataset, we set the batch size to 4 and learning rate to 5e-6. We set max length of input to 256. We use Adam (Kingma and Ba 2014) with β1 = 0.9, β2 = 0.999 for optimization. We set GCN layer to 1. We train our model for 2,800 steps (about one epoch).