OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental evaluation shows that OPENTAB significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system. |
| Researcher Affiliation | Collaboration | Kezhi Kong1 Jiani Zhang2 Zhengyuan Shen2 Balasubramaniam Srinivasan2 Chuan Lei2 Christos Faloutsos2 Huzefa Rangwala2 George Karypis2 1University of Maryland, College Park 2Amazon Web Services |
| Pseudocode | No | The paper includes diagrams illustrating the system pipeline and prompting structures (e.g., Figure 1, Figure 5, Figure 6), but it does not contain formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'To ensure reproducibility of OPENTAB, we provide detailed discussion regarding the experimental setup in Section 4... We share the prompt used in the experiments in the Appendix.' However, it does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | To evaluate the proposed approach, we use Open-Wiki Table (Kweon et al., 2023), Wiki Table Questions (Pasupat & Liang, 2015), and FEVEROUS (Aly et al., 2021) datasets. |
| Dataset Splits | No | The paper states that experiments were carried out on '2,000 random samples from the validation set' and '323 examples from the validation set' (for specific datasets) but does not provide the overall training, validation, and test split percentages or counts for the full datasets used to reproduce data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing instance types. |
| Software Dependencies | Yes | The LLM backbone used in this work is gpt-3.5-turbo (default 4k-token version)... Huggingface checkpoint with model name cross-encoder/ms-marco-Mini LM-L-12-v2... Huggingface checkpoint with model name bert-base-uncased. |
| Experiment Setup | Yes | If not specified, the LLM backbone used in this work is gpt-3.5-turbo (default 4k-token version), and the in-context learning examples are 2-shot... For BINDER, we also deploy gpt-3.5-turbo-16k to fit the long input sequences... to use 14-shot examples when doing in-context learning. |