Coresets for Decision Trees of Signals
Authors: Ibrahim Jubran, Ernesto Evgeniy Sanches Shayda, Ilan I Newman, Dan Feldman
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on sklearn and light GBM show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. |
| Researcher Affiliation | Academia | Ibrahim Jubran Department of Computer Science University of Haifa, Israel ibrahim.jub@gmail.com Ernesto Evgeniy Sanches Shayda Department of Computer Science University of Haifa, Israel ernestosanches@gmail.com Ilan Newman Department of Computer Science University of Haifa, Israel ilan@cs.haifa.ac.il Dan Feldman Department of Computer Science University of Haifa, Israel dannyf.post@gmail.com |
| Pseudocode | Yes | Algorithm 1: SLICEPARTITION(D, σ) ... Algorithm 2: PARTITION(D, γ, σ) ... Algorithm 3: SIGNAL-CORESET(D, k, ε) |
| Open Source Code | Yes | Open source code for our algorithms [35]. ... [35] Jubran, Ibrahim and Sanches, Ernesto and Newman, Ilan and Feldman, Dan. Open source code for the algorithms presented in this paper, 2021. Link for open-source code. |
| Open Datasets | Yes | Datasets. We used the following pair of datasets from the public UCI Machine Learning Repository [3], each of which was normalized to have zero mean and unit variance for every feature: (i): Air Quality Dataset [18] contains n = 9358 instances and m = 15 features. (ii) Gesture Phase Segmentation Dataset [45] contains n = 9900 instances and m = 18 features. |
| Dataset Splits | No | The paper mentions training and testing data but does not explicitly provide details about a separate validation split or cross-validation methodology used for hyperparameter tuning, beyond implicitly using the test set for evaluation of tuned parameters. |
| Hardware Specification | Yes | The hardware used was a standard MSI Prestige 14 laptop with an Intel Core i7-10710U and 16GB of RAM. |
| Software Dependencies | No | We implemented our coreset construction from Algorithm 3 in Python 3.7, and in this section we evaluate its empirical results, both on synthetic and real-world datasets. ... We used the following common implementations: (i): the function Random Forest Regressor from the sklearn.ensemble package, and (ii): the function LGBMRegressor from the light GBM package that implements a forest of gradient boosted trees. |
| Experiment Setup | Yes | Both functions were used with their default hyperparameters, unless states otherwise. ... To tune the hyperparameter k, we randomly generate a set K of possible values for k on a logarithmic scale. |