GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data

Authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
Researcher Affiliation Academia Sascha Marton University of Mannheim, Germany sascha.marton@uni-mannheim.de Stefan L udtke University of Rostock, Germany stefan.luedtke@uni-rostock.de Christian Bartelt University of Mannheim, Germany christian.bartelt@uni-mannheim.de Heiner Stuckenschmidt University of Mannheim, Germany heiner.stuckenschmidt@uni-mannheim.de
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The method is available under: https://github.com/s-marton/GRANDE
Open Datasets Yes For our evaluation, we used a predefined collection of datasets that was selected based on objective criteria from Open ML Benchmark Suites and comprises a total of 19 binary classification datasets (see Table 5 for details). The selection process was adopted from Bischl et al. (2021)
Dataset Splits Yes Furthermore, we report the mean and standard deviation of the test performance over a 5-fold cross-validation to ensure reliable results.
Hardware Specification Yes For all methods, we used a single NVIDIA RTX A6000.
Software Dependencies No The paper mentions using Optuna for hyperparameter optimization and frameworks like XGBoost and Cat Boost, but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For GRANDE, we used a batch size of 64 and early stopping after 25 epochs. Similar to NODE Popov et al. (2019), GRANDE uses an Adam optimizer with stochastic weight averaging over 5 checkpoints (Izmailov et al., 2018) and a learning rate schedule that uses a cosine decay with optional warmup (Loshchilov & Hutter, 2016). We optimized the hyperparameters using Optuna (Akiba et al., 2019) with 250 trials...