Coresets for Decision Trees of Signals

Authors: Ibrahim Jubran, Ernesto Evgeniy Sanches Shayda, Ilan I Newman, Dan Feldman

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on sklearn and light GBM show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy.
Researcher Affiliation Academia Ibrahim Jubran Department of Computer Science University of Haifa, Israel ibrahim.jub@gmail.com Ernesto Evgeniy Sanches Shayda Department of Computer Science University of Haifa, Israel ernestosanches@gmail.com Ilan Newman Department of Computer Science University of Haifa, Israel ilan@cs.haifa.ac.il Dan Feldman Department of Computer Science University of Haifa, Israel dannyf.post@gmail.com
Pseudocode Yes Algorithm 1: SLICEPARTITION(D, σ) ... Algorithm 2: PARTITION(D, γ, σ) ... Algorithm 3: SIGNAL-CORESET(D, k, ε)
Open Source Code Yes Open source code for our algorithms [35]. ... [35] Jubran, Ibrahim and Sanches, Ernesto and Newman, Ilan and Feldman, Dan. Open source code for the algorithms presented in this paper, 2021. Link for open-source code.
Open Datasets Yes Datasets. We used the following pair of datasets from the public UCI Machine Learning Repository [3], each of which was normalized to have zero mean and unit variance for every feature: (i): Air Quality Dataset [18] contains n = 9358 instances and m = 15 features. (ii) Gesture Phase Segmentation Dataset [45] contains n = 9900 instances and m = 18 features.
Dataset Splits No The paper mentions training and testing data but does not explicitly provide details about a separate validation split or cross-validation methodology used for hyperparameter tuning, beyond implicitly using the test set for evaluation of tuned parameters.
Hardware Specification Yes The hardware used was a standard MSI Prestige 14 laptop with an Intel Core i7-10710U and 16GB of RAM.
Software Dependencies No We implemented our coreset construction from Algorithm 3 in Python 3.7, and in this section we evaluate its empirical results, both on synthetic and real-world datasets. ... We used the following common implementations: (i): the function Random Forest Regressor from the sklearn.ensemble package, and (ii): the function LGBMRegressor from the light GBM package that implements a forest of gradient boosted trees.
Experiment Setup Yes Both functions were used with their default hyperparameters, unless states otherwise. ... To tune the hyperparameter k, we randomly generate a set K of possible values for k on a logarithmic scale.