Generalized Boosting

Authors: Arun Suggala, Bingbin Liu, Pradeep Ravikumar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using thorough empirical evaluation, we show that our learning algorithms have superior performance over traditional additive boosting algorithms, as well as existing greedy learning techniques for DNNs.
Researcher Affiliation Academia Arun Sai Suggala, Bingbin Liu, Pradeep Ravikumar Carnegie Mellon University Pittsburgh, PA 15213 {asuggala,bingbinl,pradeepr}@cs.cmu.edu
Pseudocode Yes Algorithm 1 Generalized Boosting ... Algorithm 2 Exact Greedy Update ... Algorithm 3 Gradient Greedy Update
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes In this section, we compare various techniques on the following image datasets: CIFAR10, MNIST, Fashion MNIST [35], MNIST-rot-back-image [24], convex [35], SVHN [28], and the following tabular datasets from UCI repository [7]: letter recognition [17], forest cover type (covtype), connect4.
Dataset Splits Yes We used hold-out set validation to pick the best hyper-parameters for all the methods. We used 20% of the training data as validation data and picked the best parameters using grid search, based on validation accuracy.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions general software components and optimizers like "XGBoost", "Ada Boost", and "SGD", but does not specify any version numbers for these or other key software dependencies.
Experiment Setup Yes We used hold-out set validation to pick the best hyper-parameters for all the methods. We used 20% of the training data as validation data and picked the best parameters using grid search, based on validation accuracy. After picking the best parameters, we train on the entire training data and report performance on the test data. For all the greedy techniques based on neural networks, we used fully connected blocks and tuned the following parameters: weight decay, width of weak feature transformers, number of boosting iterations T, which we upper bound by 15. For Cmplx Comp Boost, we set D0{5. For end-to-end training, we tuned weight decay, width of layers, depth. We used SGD for optimization of all these techniques. The number of epochs and step size schedule of SGD are chosen to ensure convergence. For XGBoost, we tuned the number of trees, depth of each tree, learning rate. The exact values of hyper-parameters tuned for each of the methods can be found in Appendix J.