SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

Authors: Leonid Iosipoi, Anton Vakhrushev

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an empirical study using public datasets which demonstrates that Sketch Boost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance.
Researcher Affiliation Collaboration Leonid Iosipoi Sber AI Lab and HSE University, Moscow, Russia iosipoileonid@gmail.com Anton Vakhrushev Sber AI Lab, Moscow, Russia btbpanda@gmail.com
Pseudocode No The paper describes algorithms but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes Py-Boost is available on Git Hub2. ... All the following experimental results and evaluation code are also available on Git Hub3.
Open Datasets Yes The experiments are conducted on 9 real-world publicly available datasets from Kaggle, Open ML, and Mulan1 for multiclass (4 datasets) and multilabel (3 datasets) classification and multitask regression (2 datasets).
Dataset Splits Yes If there is no official train/test split, we randomly split the data into training and test sets with ratio 80%-20%. Then each algorithm is trained with 5-fold cross-validation (the train folds are used to fit a model and the validation fold is used for early stopping).
Hardware Specification No The paper mentions that Py-Boost works "on GPU" and refers to "GPU-based Sketch Boost" but does not specify the model or type of GPU, CPU, or any other specific hardware component used for the experiments. It only vaguely states "on GPU".
Software Dependencies Yes Primarily we compare Sketch Boost with XGBoost (v1.6.0) and Cat Boost (v1.0.5). ... Further, we also compare Sketch Boost with Tab Net (v3.1.1)...
Experiment Setup Yes For XGBoost, Catboost, and Tab Net, we do the hyperparameter optimization using the Optuna framework [Akiba, Sano, Yanase, Ohta, and Koyama, 2019]. For Sketch Boost, we use the same hyperparameters as for Cat Boost (to speed up the experiment; we do not expect that hyperparameters will vary much since we use the same single-tree approach). The sketch size k is iterated through the grid {1, 2, 5, 10, 20} (or through a subset of this grid with values less than the output dimension).