TSQA: Tabular Scenario Based Question Answering

Authors: Xiao Li, Yawei Sun, Gong Cheng13297-13305

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To support the study of this task, we construct Geo TSQA. This dataset contains 1k real questions contextualized by tabular scenarios in the geography domain. To solve the task, we extend state-of-the-art MRC methods with TTGen, a novel table-to-text generator. It generates sentences from variously synthesized tabular data and feeds the downstream MRC method with the most useful sentences. Its sentence ranking model fuses the information in the scenario, question, and domain knowledge. Our approach outperforms a variety of strong baseline methods on Geo TSQA. We compared our approach with a variety of strong baseline methods for TSQA. We also evaluated our sentence ranking model, which is the core component of our approach.
Researcher Affiliation Academia Xiao Li, Yawei Sun, Gong Cheng State Key Laboratory for Novel Software Technology, Nanjing University, China {xiaoli.nju, ywsun}@smail.nju.edu.cn, gcheng@nju.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and data are available on Git Hub.1 1https://github.com/nju-websoft/TSQA
Open Datasets Yes We constructed Geo TSQA. To the best of our knowledge, it is the first dataset dedicated to the TSQA task. Our code and data are available on Git Hub.1 1https://github.com/nju-websoft/TSQA
Dataset Splits Yes We performed 5-fold cross-validation. For each fold, we split Geo TSQA into 80% for training and 20% for test. For model selection, we relied on an inner holdout 80%/20% training/development split.
Hardware Specification Yes We ran all the experiments on TITAN RTX GPUs.
Software Dependencies No The paper mentions specific pre-trained models like 'BERT-wwm-ext' and 'CDial GPT2LCCC-base' but does not specify versions for general software libraries or frameworks like Python, PyTorch, or TensorFlow, which are essential for reproducibility.
Experiment Setup Yes We set maximum sequence length = 256, self-attention layer = 12, hidden units = 768, epochs = 15 for MRC and template-level ranking, epochs = 5 for sentence-level ranking, batch size = 8 for MRC, batch size = 16 for template-level ranking and sentencelevel ranking, learning rate = 1e 5, and attention heads = 12.