reproducibilityindex.ai

Net-DNF: Effective Deep Modeling of Tabular Data

Authors: Liran Katzir, Gal Elidan, Ran El-Yaniv

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present extensive experiments showing that Net-DNFs signiﬁcantly and consistently outperform fully connected networks over tabular data. With relatively few hyperparameters, Net-DNFs open the door to practical end-to-end handling of tabular data using neural networks. We present ablation studies, which justify the design choices of Net-DNF including the inductive bias elements, namely, Boolean formulation, locality, and feature selection.
Researcher Affiliation	Collaboration	Liran Katzir lirank@google.com Gal Elidan elidan@google.com Ran El-Yaniv rani@cs.technion.ac.il
Pseudocode	Yes	Algorithm 1: Grid Search Procedure Input: model, conﬁgurations_list results_list = [ ] for i=1 to n_partitions do val_scores_list = [ ] test_scores_list = [ ] train, val, test = read_data(partition_index=i) for c in conﬁgurations_list do trained_model = model.train(train_data=train, val_data=val, conﬁguration=c) trained_model.load_weights_from_best_epoch() val_score = trained_model.predict(data=val) test_score = trained_model.predict(data=test) val_scores_list.append(val_score) test_scores_list.append(test_score) end best_val_index = get_index_of_best_val_score(val_scores_list) test_res = test_scores_list[best_val_index] results_list.append(test_res) end mean = mean(results_list) sem = standard_error_of_the_mean(results_list) Return: mean, sem
Open Source Code	Yes	Our code is available at https://github.com/amramabutbul/DisjunctiveNormalFormNet.
Open Datasets	Yes	The datasets used in this study are from Kaggle competitions and Open ML (Vanschoren et al., 2014). A summary of these datasets appears in Appendix C. Table 4: A description of the tabular datasets. Otto Group 93 9 61.9k Kaggle kaggle.com/c/otto-group-product-classiﬁcation-challenge/overview, Gesture Phase 32 5 9.8k Open ML openml.org/d/4538
Dataset Splits	Yes	Each dataset was ﬁrst randomly divided into ﬁve folds in a way that preserved the original distribution. Then, based on these ﬁve folds, we created ﬁve partitions of the dataset as follows. Each fold is used as the test set in one of the partitions, while the other folds are used as the training and validation sets. This way, each partition was 20% test, 10% validation, and 70% training.
Hardware Specification	Yes	All models were trained on GPUs Titan Xp 12GB RAM.
Software Dependencies	No	The paper mentions 'Tensorflow' as the implementation framework and 'Py Torch' for Tab Net, but does not provide specific version numbers for these or any other key software dependencies.
Experiment Setup	Yes	All results presented in this work were obtained using a massive grid search for optimizing each model s hyperparameters. A detailed description of the grid search process with additional details can be found in Appendices D.1, D.2. For Net-DNF we used an initial learning rate of 0.05. For FCN, we added the initial learning rate to the grid search with values of {0.05, 0.005, 0.0005}. Appendix D.3 provides extensive details on the grid parameters for Net-DNF, XGBoost, and FCN, including learning rates, max depth, dropout, and L2 lambda values.