reproducibilityindex.ai

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data

Authors: Sergei Popov, Stanislav Morozov, Artem Babenko

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks.
Researcher Affiliation	Collaboration	Sergei Popov Yandex sapopov@yandex-team.ru Stanislav Morozov Yandex Lomonosov Moscow State University stanis-morozov@yandex.ru Artem Babenko Yandex National Research University Higher School of Economics artem.babenko@phystech.edu
Pseudocode	No	The paper describes the architecture and processes using text and diagrams (Figure 1 and 2), but it does not include formal pseudocode blocks or algorithms.
Open Source Code	Yes	The Py Torch implementation of NODE is available online1. [Footnote 1: https://github.com/Qwicen/node]
Open Datasets	Yes	We perform most of the experiments on six open-source tabular datasets from different domains: Epsilon, Year Prediction, Higgs, Microsoft, Yahoo, Click. The detailed description of the datasets is available in appendix. All the datasets provide train/test splits, and we used 20% samples from the train set as a validation set to tune the hyperparameters. [Footnotes provide specific URLs for each dataset, e.g., Epsilon5 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html]
Dataset Splits	Yes	All the datasets provide train/test splits, and we used 20% samples from the train set as a validation set to tune the hyperparameters. For each dataset, we ﬁx the train/val/test splits for a fair comparison. ... In order to tune the hyperparameters, we performed a random stratiﬁed split of full training data into train set (80%) and validation set (20%) for the Epsilon, Year Prediction, Higgs, Microsoft, and Click datasets.
Hardware Specification	Yes	Our GPU setup has a single 1080Ti GPU and 2 CPU cores. In turn, our CPU setup has a 28-core Xeon E5-2660 v4 processor (which costs almost twice as much as the GPU).
Software Dependencies	Yes	We use Cat Boost v0.15 and XGBoost v0.90 as baselines, while NODE inference runs on Py Torch v1.1.0.
Experiment Setup	Yes	For the classiﬁcation datasets (Epsilon, Higgs, Click), we minimize cross-entropy loss and report the classiﬁcation error. For the regression and ranking datasets (Year Prediction, Microsoft, Yahoo), we minimize and report mean squared error... As an optimization method, we use the recent Quasi Hyperbolic Adam with parameters recommended in the original paper (Ma & Yarats, 2018). ... Neural Oblivious Decision Ensembles were tuned by grid search over the following hyperparameter values. In the multi-layer NODE, we use the same architecture for all layers, i.e., the same number of trees of the same depth. ... num layers: {2, 4, 8} total tree count: {1024, 2048} tree depth: {6, 8} tree output dim: {2, 3} ... We always use learning rate 10 3.