Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Authors: Sergei Popov, Stanislav Morozov, Artem Babenko
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. |
| Researcher Affiliation | Collaboration | Sergei Popov Yandex sapopov@yandex-team.ru Stanislav Morozov Yandex Lomonosov Moscow State University stanis-morozov@yandex.ru Artem Babenko Yandex National Research University Higher School of Economics artem.babenko@phystech.edu |
| Pseudocode | No | The paper describes the architecture and processes using text and diagrams (Figure 1 and 2), but it does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | The Py Torch implementation of NODE is available online1. [Footnote 1: https://github.com/Qwicen/node] |
| Open Datasets | Yes | We perform most of the experiments on six open-source tabular datasets from different domains: Epsilon, Year Prediction, Higgs, Microsoft, Yahoo, Click. The detailed description of the datasets is available in appendix. All the datasets provide train/test splits, and we used 20% samples from the train set as a validation set to tune the hyperparameters. [Footnotes provide specific URLs for each dataset, e.g., Epsilon5 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html] |
| Dataset Splits | Yes | All the datasets provide train/test splits, and we used 20% samples from the train set as a validation set to tune the hyperparameters. For each dataset, we fix the train/val/test splits for a fair comparison. ... In order to tune the hyperparameters, we performed a random stratified split of full training data into train set (80%) and validation set (20%) for the Epsilon, Year Prediction, Higgs, Microsoft, and Click datasets. |
| Hardware Specification | Yes | Our GPU setup has a single 1080Ti GPU and 2 CPU cores. In turn, our CPU setup has a 28-core Xeon E5-2660 v4 processor (which costs almost twice as much as the GPU). |
| Software Dependencies | Yes | We use Cat Boost v0.15 and XGBoost v0.90 as baselines, while NODE inference runs on Py Torch v1.1.0. |
| Experiment Setup | Yes | For the classification datasets (Epsilon, Higgs, Click), we minimize cross-entropy loss and report the classification error. For the regression and ranking datasets (Year Prediction, Microsoft, Yahoo), we minimize and report mean squared error... As an optimization method, we use the recent Quasi Hyperbolic Adam with parameters recommended in the original paper (Ma & Yarats, 2018). ... Neural Oblivious Decision Ensembles were tuned by grid search over the following hyperparameter values. In the multi-layer NODE, we use the same architecture for all layers, i.e., the same number of trees of the same depth. ... num layers: {2, 4, 8} total tree count: {1024, 2048} tree depth: {6, 8} tree output dim: {2, 3} ... We always use learning rate 10 3. |