reproducibilityindex.ai

DANets: Deep Abstract Networks for Tabular Data Classification and Regression

Authors: Jintai Chen, Kuanlun Liao, Yao Wan, Danny Z. Chen, Jian Wu3930-3938

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on seven real-world tabular datasets show that our ABSTLAY and DANETs are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods.
Researcher Affiliation	Academia	Jintai Chen1, Kuanlun Liao1, Yao Wan2, Danny Z. Chen3, Jian Wu4,* 1 College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2 School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 3 Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA 4 The First Affiliated Hospital, and Department of Public Health, Zhejiang University School of Medicine, Hangzhou, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/What AShot/DANet.
Open Datasets	Yes	We conduct experiments on seven open-source tabular datasets: Microsoft (Qin and Liu 2013), Year Prediction (Bertin-Mahieux et al. 2011), and Yahoo (Mohan et al. 2011) for regression; Forest Cover Type2, Click3, Epsilon4, and Cardiovascular Disease5 for classification. 2https://www.kaggle.com/c/forest-cover-type-prediction/ 3https://www.kaggle.com/c/kddcup2012-track2/ 4https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ binary.html#epsilon 5https://www.kaggle.com/sulianova/cardiovascular-disease-dataset
Dataset Splits	Yes	Most of the datasets provide train-test splits. For Click, we follow the train-test split provided by the open-source6 of NODE (Popov et al. 2019). In all the experiments, we fix the train-test split for fair comparison. ... We used the official validation set of every dataset if it is given. On the datasets that do not provide official validation sets, we stratified to sample 20% of instances from the full training datasets for validation.
Hardware Specification	Yes	All the experiments are run on NVIDIA Tesla V100.
Software Dependencies	No	The paper states: "We implement our various DANET architectures with Py Torch on Python 3.7." While Python version is specified, PyTorch version is not, making it not fully reproducible according to the specific criteria for versioned software dependencies.
Experiment Setup	Yes	In training, the batch size is 8,192 with the ghost batch size 256 in the ghost batch normalization layers, and the learning rate is initially set to 0.008 and is decayed by 5% in every 20 epochs. The optimizer is the QHAdam optimizer (Ma and Yarats 2019) with default configurations except for the weight decay rate 10 5 and discount factors (0.8, 1.0). ... We set k0 = 5, d0 = 32, and d1 = 64 as default (see Fig. 2(b)). For the datasets with large amounts of raw features (e.g., Yahoo with 699 features and Epsilon with 2K features), we set k0 = 8, d0 = 48, and d1 = 96. We use the dropout rate 0.1 for all the datasets except for Forest Cover Type without using dropout.