TableRAG: Million-Token Table Understanding with Language Models

Authors: Si-An Chen, Lesly Miculicich, Julian Eisenschlos, Zifeng Wang, Zilong Wang, Yanfei Chen, YASUHISA FUJII, Hsuan-Tien Lin, Chen-Yu Lee, Tomas Pfister

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical Studies
Researcher Affiliation Collaboration 1National Taiwan University, 2Google Cloud AI Research, 3Google Deep Mind, 4UC San Diego
Pseudocode Yes The pseudocode and an answering example on Arcade QA can be found in Alg. 1 and Fig. 8 respectively. Algorithm 1: Table RAG Algorithm
Open Source Code Yes The implementation and dataset will be available at https://github.com/google-research/google-research/tree/master/table_rag.
Open Datasets Yes We build two new million-token benchmarks sourced from the real-world Arcade [26] and BIRD-SQL [7] datasets. Additionally, to assess performance across various scales, we generated synthetic data expanding tables from the Tab Fact dataset to larger sizes, while maintaining consistent questions and key table content for evaluation.
Dataset Splits No The paper doesn't explicitly provide training/validation/test dataset splits with percentages or counts in the main text. It mentions using 'evaluation' and 'test' but not specific 'validation' splits.
Hardware Specification No Our experiments employ GPT-3.5-turbo [1], Gemini-1.0-Pro [19] and Mistral-Nemo-Instruct-24073 as LM solvers. In ablation study, we use GPT-3.5-turbo if not specified. We use Open AI s textembedding-3-large4 as the encoder for dense retrieval.
Software Dependencies No Our experiments employ GPT-3.5-turbo [1], Gemini-1.0-Pro [19] and Mistral-Nemo-Instruct-24073 as LM solvers. In ablation study, we use GPT-3.5-turbo if not specified. We use Open AI s textembedding-3-large4 as the encoder for dense retrieval. For Table RAG, we set the cell encoding budget B = 10, 000 and the retrieval limit K = 5. For Rand Row Sampling and Row Col Retrieval, we increase the retrieval limit to K = 30.
Experiment Setup Yes Our experiments employ GPT-3.5-turbo [1], Gemini-1.0-Pro [19] and Mistral-Nemo-Instruct-24073 as LM solvers. In ablation study, we use GPT-3.5-turbo if not specified. We use Open AI s textembedding-3-large4 as the encoder for dense retrieval. For Table RAG, we set the cell encoding budget B = 10, 000 and the retrieval limit K = 5. For Rand Row Sampling and Row Col Retrieval, we increase the retrieval limit to K = 30. Each experiment is conducted 10 times and evaluated by majority-voting to ensure the stability and consistency. The evaluation metric is the exact-match accuracy if not specified.