Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Authors: Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). We conduct our experiments using Pa LM 2 (Anil et al., 2023) and GPT-3.5 (Brown et al., 2020; Open AI, 2023) as the backbone LLMs. |
| Researcher Affiliation | Collaboration | 1University of California, San Diego 2Google Cloud AI Research 3Google Research |
| Pseudocode | Yes | Algorithm 1: CHAIN-OF-TABLE Prompting |
| Open Source Code | No | The paper states: 'We run Text-to-SQL and Binder using the official open-sourced code and prompts in https://github.com/HKUNLP/Binder. We run Dater using the official open-sourced code and prompts in https://github.com/ Alibaba Research/DAMO-Conv AI.' This refers to the code for baseline methods, not the code for the CHAIN-OF-TABLE framework itself. |
| Open Datasets | Yes | We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). |
| Dataset Splits | Yes | We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). We incorporate few-shot demo samples from the training set into the prompts to perform in-context learning. We guarantee that all demo samples are from the training set so they are unseen during testing. |
| Hardware Specification | No | The paper mentions using 'Pa LM 2' and 'GPT-3.5' as backbone LLMs but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Pa LM 2-S1' and 'GPT 3.5 (turbo-16k-0613)2' as backbone LLMs, which are specific models, but it does not list other software dependencies (e.g., programming languages, libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | We report the parameters and demo sample numbers we used in CHAIN-OF-TABLE in Table 7, 8 and 9. Overall, we annotate 29 samples and use them across different datasets. There are a large overlapping between the usage on different functions. For example, we use the same demo sample to introduce how to use f_add_column in the function Dynamic Plan across different datasets. We guarantee that all demo samples are from the training set so they are unseen during testing. |