reproducibilityindex.ai

SPARQA: Skeleton-Based Semantic Parsing for Complex Questions over Knowledge Bases

Authors: Yawei Sun, Lingling Zhang, Gong Cheng, Yuzhong Qu8952-8959

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our approach SPARQA on two standard KBQA datasets with complex questions. We compare with baselines, perform an ablation study of our approach, and ﬁnally we analyze each component of our skeleton parser.
Researcher Affiliation	Academia	Yawei Sun, Lingling Zhang, Gong Cheng, Yuzhong Qu National Key Laboratory for Novel Software Technology, Nanjing University, China {ywsun, llzhang}@smail.nju.edu.cn, {gcheng, yzqu}@nju.edu.cn
Pseudocode	Yes	Algorithm 1 Skeleton Parsing
Open Source Code	Yes	Our implementation is open source.1 1https://github.com/nju-websoft/SPARQA
Open Datasets	Yes	The experiments were performed on two public datasets involving complex questions. Graph Questions (Su et al. 2016) contains 5,166 questions 2,558 for training and 2,608 for testing. ... Complex Web Questions version 1.1 (Talmor and Berant 2018b) contains 34,689 questions with a split of 80-10-10 for training, validation, and test sets. ... We have made this resource public to support future research.2 2https://github.com/nju-websoft/SPARQA
Dataset Splits	Yes	Graph Questions (Su et al. 2016) contains 5,166 questions 2,558 for training and 2,608 for testing. ... Complex Web Questions version 1.1 (Talmor and Berant 2018b) contains 34,689 questions with a split of 80-10-10 for training, validation, and test sets.
Hardware Specification	No	The paper mentions the BERTBASE model conﬁguration and GloVe embeddings but does not specify any hardware details such as GPU/CPU models, processors, or memory used for experiments.
Software Dependencies	No	The paper mentions BERT models (BERTBASE), GloVe embeddings, Stanford's NER, and SUTime as software components, but it does not provide specific version numbers for these, nor for any other core libraries or programming languages used.
Experiment Setup	Yes	In our skeleton parser, all the four BERT models were based on BERTBASE (L = 12, H = 768, A = 12, total parameters = 110M). Their hyperparameters were: Split: max sequence length = 32, learning rate = 3e-5, batch size = 32, training epochs = 100, Text Span Prediction: max sequence length = 32, learning rate = 3e-5, batch size = 32, training epochs = 100, Headword Identiﬁcation: max sequence length = 32, learning rate = 3e-5, batch size = 32, training epochs = 100, Attachment Relation Classiﬁcation: max sequence length = 64, learning rate = 4e-5, batch size = 32, training epochs = 100. Sentence-Level Scorer. The BERT model was conﬁgured as follows: max sequence length = 64, learning rate = 3e-5, batch size = 32, training epochs = 4. Word-Level Scorer. We used 300-dimensional pretrained Glo Ve embeddings. The neural model was trained with hinge loss with negative sampling size = 300, using Adam with learning rate = 0.001 and batch size= 32.