reproducibilityindex.ai

Gradient Boosted Decision Trees for High Dimensional Sparse Output

Authors: Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we apply our algorithm to extreme multilabel classiﬁcation problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.
Researcher Affiliation	Collaboration	1Google Research, Mountain View, USA 2University of California at Davis, Davis, USA 3Microsoft, Mountain View, USA 4Facebook, Menlo Park, USA 5University of Texas at Austin, Austin, USA.
Pseudocode	Yes	Algorithm 1: GBDT-SPARSE tree node splitting algorithm
Open Source Code	No	The paper refers to a link for Light GBM, a baseline method ('https://github.com/Microsoft/LightGBM'), but does not provide a link or statement about the availability of their own GBDT-SPARSE code.
Open Datasets	Yes	Data: We conducted experiments on 5 standard and publicly available multi-label learning datasets.3 Table 2 shows the associated details. NUS-WIDE is available at http://lms.comp.nus.edu.sg/ research/NUS-WIDE.htm. All other datasets are available at http://manikvarma.org/downloads/XC/XMLRepository.html.
Dataset Splits	No	The paper mentions 'training samples' and 'testing samples' in Table 2, but does not explicitly define a 'validation' dataset split or describe the methodology for creating such a split.
Hardware Specification	Yes	All experiments are conducted on a machine with an Intel Xeon X5440 2.83GHz CPU and 32GB RAM. For PD-Sparse we use a similar machine with 192GB memory due to its large memory footprint. We run our algorithm with Delicious-200K on a 28-core dual socket E5-2683v3 machine
Software Dependencies	No	The paper mentions various baselines like XGBoost, Light GBM, LEML, FASTXML, SLEEC, and PD-SPARSE, stating they used 'their highly optimized C++ implementation published along with the original papers,' but does not provide specific version numbers for any software, including their own.
Experiment Setup	Yes	For our method, we kept most of the parameters ﬁxed for all the datasets: hmax = 10, nleaf = 100, and, λ = 5, where hmax and nleaf are the maximum level of the tree and the minimal number of data points in each leaf. Leaf node sparsity k was set to 100 for Delicious-200K and 20 for all others. This parameter can be very intuitively set as an increasing function of label set size. We hand tuned the projection dimensionality d and set it to 100 for Delicious and Wiki10-31K, and 50 for others.