Faster Boosting with Smaller Memory
Authors: Julaiti Alafate, Yoav S. Freund
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory. |
| Researcher Affiliation | Academia | Julaiti Alafate Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093 Yoav Freund Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093 |
| Pseudocode | Yes | We also provide the pseudo-code in the Appendix C. |
| Open Source Code | No | The source code of the implementation is released at. The paper states the code is 'released at' but does not provide a specific, accessible URL or repository link in the provided text. |
| Open Datasets | Yes | We use two large datasets: one with 50 million examples (the human acceptor splice site dataset [18, 1]), the other with over 600 million examples (the bathymetry dataset [11]). |
| Dataset Splits | Yes | We performed a 80/20 random split for training and testing. We use the same training dataset of 50 M samples as in the other work, and validate the model on the testing data set of 4.6 M samples. We use a training dataset of 623M samples, and validate the model on the testing dataset of 83M samples. |
| Hardware Specification | Yes | The experiments on large datasets are all conducted on EC2 instances with attached SSD storages from Amazon Web Services. We ran the evaluations on five different instance types with increasing memory capacities, ranging from 8 GB to 244 GB (for details see Appendix A). |
| Software Dependencies | No | The paper mentions using the 'Rust programming language' and refers to 'XGBoost and Light GBM' as baselines, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For both methods, we generate trees with depth 5 as weak rules. In all experiments, we grow trees with at most 4 leaves, or depth two. For XGBoost, we chose the approximate greedy algorithm which is its fastest training method. Light GBM supports using sampling in the training, which they called Gradient-based One-Side Sampling (GOSS). We selected GOSS as the tree construction algorithm for Light GBM. In addition, we also enabled the option in Light GBM to reduce its memory footprint. |