reproducibilityindex.ai

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental evaluation shows that OPENTAB significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system.
Researcher Affiliation	Collaboration	Kezhi Kong1 Jiani Zhang2 Zhengyuan Shen2 Balasubramaniam Srinivasan2 Chuan Lei2 Christos Faloutsos2 Huzefa Rangwala2 George Karypis2 1University of Maryland, College Park 2Amazon Web Services
Pseudocode	No	The paper includes diagrams illustrating the system pipeline and prompting structures (e.g., Figure 1, Figure 5, Figure 6), but it does not contain formal pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'To ensure reproducibility of OPENTAB, we provide detailed discussion regarding the experimental setup in Section 4... We share the prompt used in the experiments in the Appendix.' However, it does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets	Yes	To evaluate the proposed approach, we use Open-Wiki Table (Kweon et al., 2023), Wiki Table Questions (Pasupat & Liang, 2015), and FEVEROUS (Aly et al., 2021) datasets.
Dataset Splits	No	The paper states that experiments were carried out on '2,000 random samples from the validation set' and '323 examples from the validation set' (for specific datasets) but does not provide the overall training, validation, and test split percentages or counts for the full datasets used to reproduce data partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing instance types.
Software Dependencies	Yes	The LLM backbone used in this work is gpt-3.5-turbo (default 4k-token version)... Huggingface checkpoint with model name cross-encoder/ms-marco-Mini LM-L-12-v2... Huggingface checkpoint with model name bert-base-uncased.
Experiment Setup	Yes	If not specified, the LLM backbone used in this work is gpt-3.5-turbo (default 4k-token version), and the in-context learning examples are 2-shot... For BINDER, we also deploy gpt-3.5-turbo-16k to fit the long input sequences... to use 14-shot examples when doing in-context learning.