SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems
Authors: Leonid Iosipoi, Anton Vakhrushev
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an empirical study using public datasets which demonstrates that Sketch Boost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance. |
| Researcher Affiliation | Collaboration | Leonid Iosipoi Sber AI Lab and HSE University, Moscow, Russia iosipoileonid@gmail.com Anton Vakhrushev Sber AI Lab, Moscow, Russia btbpanda@gmail.com |
| Pseudocode | No | The paper describes algorithms but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Py-Boost is available on Git Hub2. ... All the following experimental results and evaluation code are also available on Git Hub3. |
| Open Datasets | Yes | The experiments are conducted on 9 real-world publicly available datasets from Kaggle, Open ML, and Mulan1 for multiclass (4 datasets) and multilabel (3 datasets) classification and multitask regression (2 datasets). |
| Dataset Splits | Yes | If there is no official train/test split, we randomly split the data into training and test sets with ratio 80%-20%. Then each algorithm is trained with 5-fold cross-validation (the train folds are used to fit a model and the validation fold is used for early stopping). |
| Hardware Specification | No | The paper mentions that Py-Boost works "on GPU" and refers to "GPU-based Sketch Boost" but does not specify the model or type of GPU, CPU, or any other specific hardware component used for the experiments. It only vaguely states "on GPU". |
| Software Dependencies | Yes | Primarily we compare Sketch Boost with XGBoost (v1.6.0) and Cat Boost (v1.0.5). ... Further, we also compare Sketch Boost with Tab Net (v3.1.1)... |
| Experiment Setup | Yes | For XGBoost, Catboost, and Tab Net, we do the hyperparameter optimization using the Optuna framework [Akiba, Sano, Yanase, Ohta, and Koyama, 2019]. For Sketch Boost, we use the same hyperparameters as for Cat Boost (to speed up the experiment; we do not expect that hyperparameters will vary much since we use the same single-tree approach). The sketch size k is iterated through the grid {1, 2, 5, 10, 20} (or through a subset of this grid with values less than the output dimension). |