reproducibilityindex.ai

A Hybrid Probabilistic Approach for Table Understanding

Authors: Kexuan Sun, Harsha Rayudu, Jay Pujara4366-4374

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The evaluation results show that our system achieves the state-of-the-art performance on cell type classiﬁcation, block identiﬁcation, and relationship prediction, improving over prior efforts by up to 7% of macro F1 score. In this section, we present the experimental evaluation of the proposed system 5 based on the four datasets.
Researcher Affiliation	Academia	Kexuan Sun, Harsha Rayudu, Jay Pujara University of Southern California, Information Sciences Institute kexuansu@usc.edu, hrayudu@usc.edu, jpujara@isi.edu
Pseudocode	Yes	Algorithm 1: Candidate Block Generation
Open Source Code	Yes	5https://github.com/kianasun/table-understanding-system
Open Datasets	Yes	Most existing benchmark datasets (such as De Ex (Eberius et al. 2013), SAUS (Chen and Cafarella 2013) and CIUS (Ghasemi-Gol, Pujara, and Szekely 2019)) consist of only Excel ﬁles, are from narrow domains and cover only cell functional types. we introduce a new benchmark dataset comprised of 431 tables downloaded from the U.S. Government s open data 4. 4https://www.data.gov/
Dataset Splits	Yes	In all experiments, we perform 5-fold cross validation on the rest of the tables: for each dataset, we randomly split the tables into 5 folds, train/validate a model using 4 folds and test on 1 fold. For the 4 folds, we randomly split the tables with 9:1 ratio into training and validation sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software libraries such as 'scikit-learn library (Buitinck et al. 2013)', 'Grid CRF class from the pystruct library (M uller and Behnke 2014)', and 'pytorch library (Paszke et al. 2019)'. While these citations indicate the year of the software's publication or a specific version at that time, explicit version numbers (e.g., 'PyTorch 1.9') are not provided.
Experiment Setup	Yes	We select n estimator among [100, 300], max depth among [5, 50, None], min sample split among [2, 10] and min samples leaf among [1, 10]. We use the bootstrap mode with balanced sub-sampling. We set batch size to be 32, learning rate to be 0.0001, and epoch to be 50. We use cross entropy loss.