reproducibilityindex.ai

TabFact: A Large-scale Dataset for Table-based Fact Verification

Authors: Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments to investigate their performances: the best-achieved accuracy of both models are reasonable, but far below human performance.
Researcher Affiliation	Collaboration	University of California, Santa Barbara, CA, USA Tencent AI Lab, Bellevue, WA, USA
Pseudocode	Yes	Algorithm 1 Latent Program Search with Comments
Open Source Code	Yes	The data and code of the dataset are provided in https://github.com/wenhuchen/Table-Fact-Checking.
Open Datasets	Yes	To this end, we construct a large-scale dataset called Tab Fact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. The data and code of the dataset are provided in https://github.com/wenhuchen/Table-Fact-Checking.
Dataset Splits	Yes	We split the whole data roughly with 8:1:1 into train, validation7, and test splits and shows their statistics in Table 1. Table 1: ... Val 12,792
Hardware Specification	Yes	We ﬁnetune the model on a single TITAN X GPU with a mini-batch size of 6. ... We run the latent program search in a distributed fashion on three 64-core machines
Software Dependencies	No	The paper mentions using "open-source implementation of BERT" and "Transformer-based two-way encoder" but does not provide specific version numbers for these or other software libraries/frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We ﬁnetune the model on a single TITAN X GPU with a mini-batch size of 6. The best performance is reached after about 3 hours of training (around 10K steps). ... For the discriminator model, we design two transformer-based encoders (3 layers, 128-dimension hidden embedding, and 4 heads at each layer) to encode the programs and statements, respectively.