reproducibilityindex.ai

TableRAG: Million-Token Table Understanding with Language Models

Authors: Si-An Chen, Lesly Miculicich, Julian Eisenschlos, Zifeng Wang, Zilong Wang, Yanfei Chen, YASUHISA FUJII, Hsuan-Tien Lin, Chen-Yu Lee, Tomas Pfister

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Empirical Studies
Researcher Affiliation	Collaboration	1National Taiwan University, 2Google Cloud AI Research, 3Google Deep Mind, 4UC San Diego
Pseudocode	Yes	The pseudocode and an answering example on Arcade QA can be found in Alg. 1 and Fig. 8 respectively. Algorithm 1: Table RAG Algorithm
Open Source Code	Yes	The implementation and dataset will be available at https://github.com/google-research/google-research/tree/master/table_rag.
Open Datasets	Yes	We build two new million-token benchmarks sourced from the real-world Arcade [26] and BIRD-SQL [7] datasets. Additionally, to assess performance across various scales, we generated synthetic data expanding tables from the Tab Fact dataset to larger sizes, while maintaining consistent questions and key table content for evaluation.
Dataset Splits	No	The paper doesn't explicitly provide training/validation/test dataset splits with percentages or counts in the main text. It mentions using 'evaluation' and 'test' but not specific 'validation' splits.
Hardware Specification	No	Our experiments employ GPT-3.5-turbo [1], Gemini-1.0-Pro [19] and Mistral-Nemo-Instruct-24073 as LM solvers. In ablation study, we use GPT-3.5-turbo if not speciﬁed. We use Open AI s textembedding-3-large4 as the encoder for dense retrieval.
Software Dependencies	No	Our experiments employ GPT-3.5-turbo [1], Gemini-1.0-Pro [19] and Mistral-Nemo-Instruct-24073 as LM solvers. In ablation study, we use GPT-3.5-turbo if not speciﬁed. We use Open AI s textembedding-3-large4 as the encoder for dense retrieval. For Table RAG, we set the cell encoding budget B = 10, 000 and the retrieval limit K = 5. For Rand Row Sampling and Row Col Retrieval, we increase the retrieval limit to K = 30.
Experiment Setup	Yes	Our experiments employ GPT-3.5-turbo [1], Gemini-1.0-Pro [19] and Mistral-Nemo-Instruct-24073 as LM solvers. In ablation study, we use GPT-3.5-turbo if not speciﬁed. We use Open AI s textembedding-3-large4 as the encoder for dense retrieval. For Table RAG, we set the cell encoding budget B = 10, 000 and the retrieval limit K = 5. For Rand Row Sampling and Row Col Retrieval, we increase the retrieval limit to K = 30. Each experiment is conducted 10 times and evaluated by majority-voting to ensure the stability and consistency. The evaluation metric is the exact-match accuracy if not speciﬁed.