reproducibilityindex.ai

An Inductive Bias for Tabular Deep Learning

Authors: Ege Beyazit, Jonathan Kozaczuk, Bo Li, Vanessa Wallace, Bilal Fadlallah

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation focuses on 3 key metrics: performance, rate of convergence and the irregularity of functions learned. Performance is evaluated using accuracy and the area under the receiver operating characteristic curve (AUROC). Rate of convergence is evaluated using the mean number of training epochs required to minimize validation loss. Finally, irregularity of functions learned by the neural network models are measured using total high-frequency power in Equation 2 along top principal components (PCs). We evaluate our proposed approach using 14 benchmark classification datasets listed in Table 2.
Researcher Affiliation	Industry	Ege Beyazit Amazon beyazit@amazon.com Jonathan Kozaczuk Amazon jonkozac@amazon.com Bo Li Amazon booli@amazon.com Vanessa Wallace Amazon vwall@amazon.com Bilal Fadlallah Amazon bhf@amazon.com
Pseudocode	No	The paper provides implementation details with code snippets in Appendix H, but these are not presented as formal pseudocode or algorithm blocks.
Open Source Code	Yes	Implementation details to reproduce our results are provided in Appendix H.
Open Datasets	Yes	We evaluate our proposed approach using 14 benchmark classification datasets listed in Table 2. These datasets are used by [13] to demonstrate the performance gap between treebased models and neural networks. Table 2: 14 Tabular datasets used in the experiments Name #Samples #Features Source electricity [9] 45312 9 https://openml.org/d/151 house_16H 22784 17 https://openml.org/d/821 pol 15000 49 https://openml.org/d/722 kdd_ipums_la_97-small 7019 61 https://openml.org/d/993 Magic Telescope [9] 19020 11 https://openml.org/d/1120 bank-marketing [9] 45211 17 https://openml.org/d/1461 phoneme 5404 6 https://openml.org/d/1489 Mini Boo NE [9] 130064 51 https://openml.org/d/41150 eye_movements [24] 10936 28 https://openml.org/d/1044 jannis 83733 55 https://openml.org/d/41168 california [19] 20640 8 https://www.dcc.fc.up.pt/ltorgo/Regression/cal_housing.html albert 425240 79 https://openml.org/d/41147 credit card clients [9] 30000 24 https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients Diabetes [9] 768 9 https://www.openml.org/search?type=data&sort=runs&id=37
Dataset Splits	Yes	We use 70% of each dataset for training, 15% for validation and 15% for testing.
Hardware Specification	No	The paper does not specify the hardware used for the experiments, such as particular GPU or CPU models.
Software Dependencies	No	The paper mentions software like PyTorch, scikit-learn, and nfft, but does not provide specific version numbers for these dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	We conduct hyperparameter tuning for every model across 100 configurations using the validation set. We use Hyper Opt [4] to tune the hyperparameters of all approaches considered. Specifically, the parameter space considered for each model are shown in Tables 3, 4, and 5. Table 3: Hyperparameter space for MLPs. Batch size is set to 128 and is not tuned. Py Torch s [20] implementation of Adam [17] with its default parameters is used for optimization.