TabFact: A Large-scale Dataset for Table-based Fact Verification

Authors: Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments to investigate their performances: the best-achieved accuracy of both models are reasonable, but far below human performance.
Researcher Affiliation Collaboration University of California, Santa Barbara, CA, USA Tencent AI Lab, Bellevue, WA, USA
Pseudocode Yes Algorithm 1 Latent Program Search with Comments
Open Source Code Yes The data and code of the dataset are provided in https://github.com/wenhuchen/Table-Fact-Checking.
Open Datasets Yes To this end, we construct a large-scale dataset called Tab Fact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. The data and code of the dataset are provided in https://github.com/wenhuchen/Table-Fact-Checking.
Dataset Splits Yes We split the whole data roughly with 8:1:1 into train, validation7, and test splits and shows their statistics in Table 1. Table 1: ... Val 12,792
Hardware Specification Yes We finetune the model on a single TITAN X GPU with a mini-batch size of 6. ... We run the latent program search in a distributed fashion on three 64-core machines
Software Dependencies No The paper mentions using "open-source implementation of BERT" and "Transformer-based two-way encoder" but does not provide specific version numbers for these or other software libraries/frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We finetune the model on a single TITAN X GPU with a mini-batch size of 6. The best performance is reached after about 3 hours of training (around 10K steps). ... For the discriminator model, we design two transformer-based encoders (3 layers, 128-dimension hidden embedding, and 4 heads at each layer) to encode the programs and statements, respectively.