Regularized Gradient Boosting

Authors: Corinna Cortes, Mehryar Mohri, Dmitry Storcheus

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide experimental results, demonstrating that our algorithm achieves significantly better out-of-sample performance on multiple datasets than the standard GB algorithm used with its regularization.
Researcher Affiliation Collaboration Corinna Cortes Google Research New York, NY 10011 corinna@google.com Mehryar Mohri Google & Courant Institute New York, NY 10012 mohri@google.com Dmitry Storcheus Courant Institute & Google New York, NY 10012 dstorcheus@google.com
Pseudocode Yes Algorithm 1 RGB. Input: α = 0, F = 0 1: for t [1, T] do 2: [t1, , t S] P 3: for s [1, S] do 4: hs argminh Hts 1 m Pm i=1 Φ yi, F 1 Cts L ts(α)h 5: end for 6: s = argmins [1,S] 1 m Pm i=1 Φ yi, F 1 Cts L ts(α)hs + βΩ(hts) 7: α α 1 Cs L s (α)ets 8: F F 1 Cs L s (α)hs 9: end for
Open Source Code No No explicit statement or link providing access to the open-source code for the methodology described in this paper was found.
Open Datasets Yes Table 1 shows the classification errors on the test sets for the UCI datasets studied, for both RGB and GB, see Table 2 in the appendix for details on the dataset. Table 2: Dataset Statistics. Dataset #Features #Train #Test sonar [UCI] 60 104 104 cancer [UCI] 9 342 227 diabetes [UCI] 8 468 300 ocr17 [LIBSVM] 256 1686 422 ocr49 [LIBSVM] 256 1686 422 mnist17 [LIBSVM] 780 12665 3167 mnist49 [LIBSVM] 780 12665 3167 higgs [UCI] 28 88168 22042
Dataset Splits Yes The hyperparameters are chosen via 5-fold cross-validation, and the standard errors for the best set of hyperparameters reported.
Hardware Specification No No specific hardware details (GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided.
Software Dependencies No The paper mentions using "the XGBOOST library" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For a given training sample, we normalize the regularization Ω(h) to be in [0, 1] and tune the RGB parameter β using a grid search over β {0.001, 0.01, 0.1, 0.3, 1}. We use the logistic loss as the per-instance loss Φ. For the complexity of these base classifiers we use the bound derived in Theorem 1. To define the subfamilies of base learners we impose a grid of size 7 on the maximum number of internal nodes n {2, 4, 8, 16, 32, 64, 256} and a grid of size 7 on λ {0.001, 0.01, 0.1, 0.5, 1, 2, 4}. Both GB and RGB are run for T = 100 boosting rounds. The hyperparameters are chosen via 5-fold cross-validation, and the standard errors for the best set of hyperparameters reported. Specifically, we let the ℓ2 norm regularization parameter be in {0.001, 0.01, 0.1, 0.5, 1, 2, 4}, the maximum tree depth parameter in {1, 2, 3, 4, 5, 6, 7}, and the learning rate parameter in {0.001, 0.01, 0.1, 0.5, 1}.