reproducibilityindex.ai

Faster Boosting with Smaller Memory

Authors: Julaiti Alafate, Yoav S. Freund

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to ﬁt in memory.
Researcher Affiliation	Academia	Julaiti Alafate Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093 Yoav Freund Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093
Pseudocode	Yes	We also provide the pseudo-code in the Appendix C.
Open Source Code	No	The source code of the implementation is released at. The paper states the code is 'released at' but does not provide a specific, accessible URL or repository link in the provided text.
Open Datasets	Yes	We use two large datasets: one with 50 million examples (the human acceptor splice site dataset [18, 1]), the other with over 600 million examples (the bathymetry dataset [11]).
Dataset Splits	Yes	We performed a 80/20 random split for training and testing. We use the same training dataset of 50 M samples as in the other work, and validate the model on the testing data set of 4.6 M samples. We use a training dataset of 623M samples, and validate the model on the testing dataset of 83M samples.
Hardware Specification	Yes	The experiments on large datasets are all conducted on EC2 instances with attached SSD storages from Amazon Web Services. We ran the evaluations on ﬁve different instance types with increasing memory capacities, ranging from 8 GB to 244 GB (for details see Appendix A).
Software Dependencies	No	The paper mentions using the 'Rust programming language' and refers to 'XGBoost and Light GBM' as baselines, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For both methods, we generate trees with depth 5 as weak rules. In all experiments, we grow trees with at most 4 leaves, or depth two. For XGBoost, we chose the approximate greedy algorithm which is its fastest training method. Light GBM supports using sampling in the training, which they called Gradient-based One-Side Sampling (GOSS). We selected GOSS as the tree construction algorithm for Light GBM. In addition, we also enabled the option in Light GBM to reduce its memory footprint.