Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Regularized Gradient Boosting
Authors: Corinna Cortes, Mehryar Mohri, Dmitry Storcheus
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide experimental results, demonstrating that our algorithm achieves significantly better out-of-sample performance on multiple datasets than the standard GB algorithm used with its regularization. |
| Researcher Affiliation | Collaboration | Corinna Cortes Google Research New York, NY 10011 EMAIL Mehryar Mohri Google & Courant Institute New York, NY 10012 EMAIL Dmitry Storcheus Courant Institute & Google New York, NY 10012 EMAIL |
| Pseudocode | Yes | Algorithm 1 RGB. Input: α = 0, F = 0 1: for t [1, T] do 2: [t1, , t S] P 3: for s [1, S] do 4: hs argminh Hts 1 m Pm i=1 Φ yi, F 1 Cts L ts(α)h 5: end for 6: s = argmins [1,S] 1 m Pm i=1 Φ yi, F 1 Cts L ts(α)hs + βΩ(hts) 7: α α 1 Cs L s (α)ets 8: F F 1 Cs L s (α)hs 9: end for |
| Open Source Code | No | No explicit statement or link providing access to the open-source code for the methodology described in this paper was found. |
| Open Datasets | Yes | Table 1 shows the classification errors on the test sets for the UCI datasets studied, for both RGB and GB, see Table 2 in the appendix for details on the dataset. Table 2: Dataset Statistics. Dataset #Features #Train #Test sonar [UCI] 60 104 104 cancer [UCI] 9 342 227 diabetes [UCI] 8 468 300 ocr17 [LIBSVM] 256 1686 422 ocr49 [LIBSVM] 256 1686 422 mnist17 [LIBSVM] 780 12665 3167 mnist49 [LIBSVM] 780 12665 3167 higgs [UCI] 28 88168 22042 |
| Dataset Splits | Yes | The hyperparameters are chosen via 5-fold cross-validation, and the standard errors for the best set of hyperparameters reported. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions using "the XGBOOST library" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For a given training sample, we normalize the regularization Ω(h) to be in [0, 1] and tune the RGB parameter β using a grid search over β {0.001, 0.01, 0.1, 0.3, 1}. We use the logistic loss as the per-instance loss Φ. For the complexity of these base classifiers we use the bound derived in Theorem 1. To define the subfamilies of base learners we impose a grid of size 7 on the maximum number of internal nodes n {2, 4, 8, 16, 32, 64, 256} and a grid of size 7 on λ {0.001, 0.01, 0.1, 0.5, 1, 2, 4}. Both GB and RGB are run for T = 100 boosting rounds. The hyperparameters are chosen via 5-fold cross-validation, and the standard errors for the best set of hyperparameters reported. Specifically, we let the ℓ2 norm regularization parameter be in {0.001, 0.01, 0.1, 0.5, 1, 2, 4}, the maximum tree depth parameter in {1, 2, 3, 4, 5, 6, 7}, and the learning rate parameter in {0.001, 0.01, 0.1, 0.5, 1}. |