reproducibilityindex.ai

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

Authors: Leonid Iosipoi, Anton Vakhrushev

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an empirical study using public datasets which demonstrates that Sketch Boost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance.
Researcher Affiliation	Collaboration	Leonid Iosipoi Sber AI Lab and HSE University, Moscow, Russia iosipoileonid@gmail.com Anton Vakhrushev Sber AI Lab, Moscow, Russia btbpanda@gmail.com
Pseudocode	No	The paper describes algorithms but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Py-Boost is available on Git Hub2. ... All the following experimental results and evaluation code are also available on Git Hub3.
Open Datasets	Yes	The experiments are conducted on 9 real-world publicly available datasets from Kaggle, Open ML, and Mulan1 for multiclass (4 datasets) and multilabel (3 datasets) classification and multitask regression (2 datasets).
Dataset Splits	Yes	If there is no official train/test split, we randomly split the data into training and test sets with ratio 80%-20%. Then each algorithm is trained with 5-fold cross-validation (the train folds are used to fit a model and the validation fold is used for early stopping).
Hardware Specification	No	The paper mentions that Py-Boost works "on GPU" and refers to "GPU-based Sketch Boost" but does not specify the model or type of GPU, CPU, or any other specific hardware component used for the experiments. It only vaguely states "on GPU".
Software Dependencies	Yes	Primarily we compare Sketch Boost with XGBoost (v1.6.0) and Cat Boost (v1.0.5). ... Further, we also compare Sketch Boost with Tab Net (v3.1.1)...
Experiment Setup	Yes	For XGBoost, Catboost, and Tab Net, we do the hyperparameter optimization using the Optuna framework [Akiba, Sano, Yanase, Ohta, and Koyama, 2019]. For Sketch Boost, we use the same hyperparameters as for Cat Boost (to speed up the experiment; we do not expect that hyperparameters will vary much since we use the same single-tree approach). The sketch size k is iterated through the grid {1, 2, 5, 10, 20} (or through a subset of this grid with values less than the output dimension).