Table-to-Text: Describing Table Region With Natural Language
Authors: Junwei Bao, Duyu Tang, Nan Duan, Zhao Yan, Yuanhua Lv, Ming Zhou, Tiejun Zhao
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the accuracy of the model and the power of the copying mechanism. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our model improves the current state-of-the-art BLEU-4 score from 34.70 to 40.26 and from 33.32 to 39.12, respectively. Furthermore, we introduce an open-domain dataset WIKITABLETEXT including 13,318 explanatory sentences for 4,962 tables. Our model achieves a BLEU-4 score of 38.23, which outperforms template based and language model based approaches. |
| Researcher Affiliation | Collaboration | Harbin Institute of Technology, Harbin, China Microsoft Research, Beijing, China Beihang University, Beijing, China Microsoft AI and Research, Sunnyvale CA, USA |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'We release an open-domain dataset WIKITABLETEXT', but it does not provide an explicit statement or link for the source code of the described methodology. |
| Open Datasets | Yes | Furthermore, we introduce an open-domain dataset, WIKITABLETEXT, including 13,318 explanatory sentences for 4,962 tables. We release an open-domain dataset WIKITABLETEXT, and hope that it can offer opportunities to further research in this area. WIKIBIO is introduced by (Lebret, Grangier, and Auli 2016) for generating biography to describe an infobox. We follow (Serban et al. 2016) to generate questions from knowledge base (KB) facts on SIMPLEQUESTIONS (Bordes et al. 2015). |
| Dataset Splits | Yes | We randomly split the entire dataset into training (10,000), development (1,318), and test (2,000) sets. The corpus contains 728,321 instances, which has been divided into three sub-parts to provide 582,659 for training, 72,831 for validation and 72,831 for testing. The dataset is split into three parts: 75,910 for training, 10,845 for validation, and 20,687 for test. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Ada-delta' as an adaptive learning rate method but does not list any specific software libraries, frameworks, or their version numbers (e.g., Python, TensorFlow, PyTorch, scikit-learn). |
| Experiment Setup | Yes | We randomly initialize the parameters in our model with a Gaussian distribution, set the dimension of the word/attribute embedding as 300, and set the dimension of the decoder hidden state as 500. We adopt Ada-delta (Zeiler 2012) to adapt the learning rate. A dev set is used to half the learning rate when the performance on the dev set does not improve for 6 continuous epoches. We update parameters in an end-to-end fashion using back-propagation. In the inference process, we use beam search and set the beam size as 5. |