reproducibilityindex.ai

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Authors: Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). We conduct our experiments using Pa LM 2 (Anil et al., 2023) and GPT-3.5 (Brown et al., 2020; Open AI, 2023) as the backbone LLMs.
Researcher Affiliation	Collaboration	1University of California, San Diego 2Google Cloud AI Research 3Google Research
Pseudocode	Yes	Algorithm 1: CHAIN-OF-TABLE Prompting
Open Source Code	No	The paper states: 'We run Text-to-SQL and Binder using the ofﬁcial open-sourced code and prompts in https://github.com/HKUNLP/Binder. We run Dater using the ofﬁcial open-sourced code and prompts in https://github.com/ Alibaba Research/DAMO-Conv AI.' This refers to the code for baseline methods, not the code for the CHAIN-OF-TABLE framework itself.
Open Datasets	Yes	We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019).
Dataset Splits	Yes	We evaluate the proposed CHAIN-OF-TABLE on three public table understanding benchmarks: Wiki TQ (Pasupat & Liang, 2015), Fe Ta QA (Nan et al., 2022), and Tab Fact (Chen et al., 2019). We incorporate few-shot demo samples from the training set into the prompts to perform in-context learning. We guarantee that all demo samples are from the training set so they are unseen during testing.
Hardware Specification	No	The paper mentions using 'Pa LM 2' and 'GPT-3.5' as backbone LLMs but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Pa LM 2-S1' and 'GPT 3.5 (turbo-16k-0613)2' as backbone LLMs, which are specific models, but it does not list other software dependencies (e.g., programming languages, libraries, frameworks) with version numbers.
Experiment Setup	Yes	We report the parameters and demo sample numbers we used in CHAIN-OF-TABLE in Table 7, 8 and 9. Overall, we annotate 29 samples and use them across different datasets. There are a large overlapping between the usage on different functions. For example, we use the same demo sample to introduce how to use f_add_column in the function Dynamic Plan across different datasets. We guarantee that all demo samples are from the training set so they are unseen during testing.