OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental evaluation shows that OPENTAB significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system.
Researcher Affiliation Collaboration Kezhi Kong1 Jiani Zhang2 Zhengyuan Shen2 Balasubramaniam Srinivasan2 Chuan Lei2 Christos Faloutsos2 Huzefa Rangwala2 George Karypis2 1University of Maryland, College Park 2Amazon Web Services
Pseudocode No The paper includes diagrams illustrating the system pipeline and prompting structures (e.g., Figure 1, Figure 5, Figure 6), but it does not contain formal pseudocode or algorithm blocks.
Open Source Code No The paper states: 'To ensure reproducibility of OPENTAB, we provide detailed discussion regarding the experimental setup in Section 4... We share the prompt used in the experiments in the Appendix.' However, it does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes To evaluate the proposed approach, we use Open-Wiki Table (Kweon et al., 2023), Wiki Table Questions (Pasupat & Liang, 2015), and FEVEROUS (Aly et al., 2021) datasets.
Dataset Splits No The paper states that experiments were carried out on '2,000 random samples from the validation set' and '323 examples from the validation set' (for specific datasets) but does not provide the overall training, validation, and test split percentages or counts for the full datasets used to reproduce data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing instance types.
Software Dependencies Yes The LLM backbone used in this work is gpt-3.5-turbo (default 4k-token version)... Huggingface checkpoint with model name cross-encoder/ms-marco-Mini LM-L-12-v2... Huggingface checkpoint with model name bert-base-uncased.
Experiment Setup Yes If not specified, the LLM backbone used in this work is gpt-3.5-turbo (default 4k-token version), and the in-context learning examples are 2-shot... For BINDER, we also deploy gpt-3.5-turbo-16k to fit the long input sequences... to use 14-shot examples when doing in-context learning.