Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Authors: Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). We conduct our experiments using Pa LM 2 (Anil et al., 2023) and GPT-3.5 (Brown et al., 2020; Open AI, 2023) as the backbone LLMs.
Researcher Affiliation Collaboration 1University of California, San Diego 2Google Cloud AI Research 3Google Research
Pseudocode Yes Algorithm 1: CHAIN-OF-TABLE Prompting
Open Source Code No The paper states: 'We run Text-to-SQL and Binder using the official open-sourced code and prompts in https://github.com/HKUNLP/Binder. We run Dater using the official open-sourced code and prompts in https://github.com/ Alibaba Research/DAMO-Conv AI.' This refers to the code for baseline methods, not the code for the CHAIN-OF-TABLE framework itself.
Open Datasets Yes We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019).
Dataset Splits Yes We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). We incorporate few-shot demo samples from the training set into the prompts to perform in-context learning. We guarantee that all demo samples are from the training set so they are unseen during testing.
Hardware Specification No The paper mentions using 'Pa LM 2' and 'GPT-3.5' as backbone LLMs but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Pa LM 2-S1' and 'GPT 3.5 (turbo-16k-0613)2' as backbone LLMs, which are specific models, but it does not list other software dependencies (e.g., programming languages, libraries, frameworks) with version numbers.
Experiment Setup Yes We report the parameters and demo sample numbers we used in CHAIN-OF-TABLE in Table 7, 8 and 9. Overall, we annotate 29 samples and use them across different datasets. There are a large overlapping between the usage on different functions. For example, we use the same demo sample to introduce how to use f_add_column in the function Dynamic Plan across different datasets. We guarantee that all demo samples are from the training set so they are unseen during testing.