A Hybrid Probabilistic Approach for Table Understanding
Authors: Kexuan Sun, Harsha Rayudu, Jay Pujara4366-4374
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The evaluation results show that our system achieves the state-of-the-art performance on cell type classification, block identification, and relationship prediction, improving over prior efforts by up to 7% of macro F1 score. In this section, we present the experimental evaluation of the proposed system 5 based on the four datasets. |
| Researcher Affiliation | Academia | Kexuan Sun, Harsha Rayudu, Jay Pujara University of Southern California, Information Sciences Institute kexuansu@usc.edu, hrayudu@usc.edu, jpujara@isi.edu |
| Pseudocode | Yes | Algorithm 1: Candidate Block Generation |
| Open Source Code | Yes | 5https://github.com/kianasun/table-understanding-system |
| Open Datasets | Yes | Most existing benchmark datasets (such as De Ex (Eberius et al. 2013), SAUS (Chen and Cafarella 2013) and CIUS (Ghasemi-Gol, Pujara, and Szekely 2019)) consist of only Excel files, are from narrow domains and cover only cell functional types. we introduce a new benchmark dataset comprised of 431 tables downloaded from the U.S. Government s open data 4. 4https://www.data.gov/ |
| Dataset Splits | Yes | In all experiments, we perform 5-fold cross validation on the rest of the tables: for each dataset, we randomly split the tables into 5 folds, train/validate a model using 4 folds and test on 1 fold. For the 4 folds, we randomly split the tables with 9:1 ratio into training and validation sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software libraries such as 'scikit-learn library (Buitinck et al. 2013)', 'Grid CRF class from the pystruct library (M uller and Behnke 2014)', and 'pytorch library (Paszke et al. 2019)'. While these citations indicate the year of the software's publication or a specific version at that time, explicit version numbers (e.g., 'PyTorch 1.9') are not provided. |
| Experiment Setup | Yes | We select n estimator among [100, 300], max depth among [5, 50, None], min sample split among [2, 10] and min samples leaf among [1, 10]. We use the bootstrap mode with balanced sub-sampling. We set batch size to be 32, learning rate to be 0.0001, and epoch to be 50. We use cross entropy loss. |