reproducibilityindex.ai

Generation-Focused Table-Based Intermediate Pre-training for Free-Form Question Answering

Authors: Peng Shi, Patrick Ng, Feng Nan, Henghui Zhu, Jun Wang, Jiarong Jiang, Alexander Hanbo Li, Rishav Chakravarti, Donald Weidner, Bing Xiang, Zhiguo Wang11312-11320

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on experimental results, models that leverage GENTAP framework outperform the existing baselines on FETAQA benchmark. The pre-trained models are not only useful for free-form question answering, but also for few-shot data-to-text generation task, thus showing good transfer ability by obtaining new state-of-the-art results.
Researcher Affiliation	Collaboration	Peng Shi1*, Patrick Ng2, Feng Nan2, Henghui Zhu2, Jun Wang2, Jiarong Jiang2, Alexander Hanbo Li2, Rishav Chakravarti2, Donald Weidner2, Bing Xiang2, Zhiguo Wang2 1 University of Waterloo 2 AWS AI Labs peng.shi@uwaterloo.ca, patricng@amazon.com
Pseudocode	No	The paper describes methods in text and provides a grammar definition (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that the source code for the proposed GENTAP framework or models is publicly available, nor does it provide a direct link to a code repository for their implementation.
Open Datasets	Yes	We evaluate our model on the FETAQA (Nan et al. 2021) dataset. We evaluate our model on Data-to-Text generation Dataset (FSD2T) (Chen et al. 2019).
Dataset Splits	Yes	The models are chosen based on the performance of the development set with 1000 examples. We evaluate our model on the FETAQA (Nan et al. 2021) dataset. The SQUAD dataset is designed for reading comprehension task where (question, paragraph, short-form answer) triples are provided. We ﬁnetuned GENTAP on 50, 100, 300, 500, 1000 and 2000 sampled training examples.
Hardware Specification	Yes	For intermediate pre-training, we use 8 Tesla V100 GPUs to train at most 100K steps with initial learning rate of 2e-5 and batch size of 64. For Fe Ta QA dataset ﬁnetuning, 4 Tesla V100 GPUs are used to train the model, with initial learning rate of 1e-5 and batch size of 32. For FSD2T dataset ﬁnetuning, 1 Tesla V100 GPU is used to train with initial learning rate of 1e-5 and batch size of 8.
Software Dependencies	No	The paper refers to pre-trained models like BART, T5, and GPT2, but does not provide specific version numbers for software dependencies such as programming languages or libraries.
Experiment Setup	Yes	For intermediate pre-training, we use 8 Tesla V100 GPUs to train at most 100K steps with initial learning rate of 2e-5 and batch size of 64. For Fe Ta QA dataset ﬁnetuning, 4 Tesla V100 GPUs are used to train the model, with initial learning rate of 1e-5 and batch size of 32. For FSD2T dataset ﬁnetuning, 1 Tesla V100 GPU is used to train with initial learning rate of 1e-5 and batch size of 8.