Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering
Authors: Wanjun Zhong, Junjie Huang, Qian Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our system on OTT-QA, a large-scale table-and-text open-domain question answering benchmark, and our system achieves the state-of-the-art performance. Further analyses illustrate that the explicit hybrid chain offers substantial performance improvement and interpretablity of the intermediate reasoning process, and the chain-centric pre-training boosts the performance on the chain extraction. ... 4 Experiments We conduct experiments to explore the effectiveness of our method from the following aspects: (1) the performance of our overall system on QA; (2) the performance of the hybrid chain extraction model; (3) the ablation study about the pretraining strategy; (4) the comprehensive qualitative analysis. |
| Researcher Affiliation | Collaboration | Wanjun Zhong1 , Junjie Huang3 , Qian Liu3 , Ming Zhou4 , Jiahai Wang1 , Jian Yin1 and Nan Duan2 1 The School of Computer Science and Engineering, Sun Yat-sen University 2 Microsoft Research Asia 3 Beihang University 4 Langboat Technology |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code is available at https://github.com/zhongwanjun/CARP |
| Open Datasets | Yes | We evaluate our approach on the OTT-QA [Chen et al., 2020a] dataset. OTT-QA is a large-scale table-and-text open-domain question answering benchmark for evaluating open-domain question answering over both tabular and textual knowledge. OTT-QA has over 40K instances and it also provides a corpus collected from Wikipedia with over 400K tables and 6 million passages. |
| Dataset Splits | Yes | Dev Test Models EM F1 EM F1 ... Table 1: Performance of different methods on the dev. set and the blind test set on OTT-QA. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like RoBERTa, BART, Longformer, BLINK, and FAISS, but it does not specify their version numbers. |
| Experiment Setup | No | The paper describes the model architecture and training objectives (e.g., cross-entropy loss, sparse-attention Transformer), but it does not provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or detailed optimizer settings necessary for replication. |