Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering

Authors: Wanjun Zhong, Junjie Huang, Qian Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our system on OTT-QA, a large-scale table-and-text open-domain question answering benchmark, and our system achieves the state-of-the-art performance. Further analyses illustrate that the explicit hybrid chain offers substantial performance improvement and interpretablity of the intermediate reasoning process, and the chain-centric pre-training boosts the performance on the chain extraction. ... 4 Experiments We conduct experiments to explore the effectiveness of our method from the following aspects: (1) the performance of our overall system on QA; (2) the performance of the hybrid chain extraction model; (3) the ablation study about the pretraining strategy; (4) the comprehensive qualitative analysis.
Researcher Affiliation Collaboration Wanjun Zhong1 , Junjie Huang3 , Qian Liu3 , Ming Zhou4 , Jiahai Wang1 , Jian Yin1 and Nan Duan2 1 The School of Computer Science and Engineering, Sun Yat-sen University 2 Microsoft Research Asia 3 Beihang University 4 Langboat Technology
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code is available at https://github.com/zhongwanjun/CARP
Open Datasets Yes We evaluate our approach on the OTT-QA [Chen et al., 2020a] dataset. OTT-QA is a large-scale table-and-text open-domain question answering benchmark for evaluating open-domain question answering over both tabular and textual knowledge. OTT-QA has over 40K instances and it also provides a corpus collected from Wikipedia with over 400K tables and 6 million passages.
Dataset Splits Yes Dev Test Models EM F1 EM F1 ... Table 1: Performance of different methods on the dev. set and the blind test set on OTT-QA.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software components like RoBERTa, BART, Longformer, BLINK, and FAISS, but it does not specify their version numbers.
Experiment Setup No The paper describes the model architecture and training objectives (e.g., cross-entropy loss, sparse-attention Transformer), but it does not provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or detailed optimizer settings necessary for replication.