Optimal Minimal Margin Maximization with Boosting

Authors: Alexander Mathiasen, Kasper Green Larsen, Allan Grønlund

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we demonstrate how our sparsification algorithm can be combined with Gradient Boosting. For simplicity we consider a single dataset in this section, the Flight Delay dataset (air), see the full version (Grønlund et al., 2019) for similar results on other dataset. We train a classifier with T = 500 hypotheses using Light GBM (Ke et al., 2017) which we sparsify using Theorem 2.2 to have T = 80 hypotheses. The sparsified classifier is guaranteed to preserve all margins of the original classifier to an additive O(lg(n/T )/T ). The cumulative margins of the sparsified classifier and the original classifier are depicted in Figure 1. Furthermore, we also depict the cumulative margins of a Light GBM classifier trained to have T = 80 hypotheses. ... To investigate this, we performed additional experiments computing AUC and classification accuracy of several sparsified classifiers and Light GBM classifiers on a test set...The experiments indeed show that the sparsified classifiers outperform the Light GBM classifiers with the same number of hypotheses. See Figure 2 for test AUC and test classification accuracy.
Researcher Affiliation Academia Allan Grønlund * 1 Kasper Green Larsen * 1 Alexander Mathiasen * 1 1Department of Computer Science, University of Aarhus, Denmark.
Pseudocode Yes Algorithm 1 Sparsi Boost Input: Training data D = {(xi, yi)}n i=1 where xi ∈ X for some input space X and yi ∈ {−1, +1}. Target number of hypotheses T and base learning algorithm A. Output: Hypotheses h1, . . . , hk and weights w1, . . . , wk with k ≤ T, such that P i wihi has gap O(p lg(2 + n/T)/T) on D. 1. Run Ada Boost V with base learning algorithm A on training data D to get c · T hypotheses h1, . . . , hc·T and weights w1, . . . , wc·T for the integer c = dl g(n)/ lg(2 + n/T)e . 2. Construct margin matrix U ∈ [−1, +1]n×c·T where uij = yihj(xi). 3. Form the vector w with i th coordinate wi and normalize w ← w/kwk1 so kwk1 = 1. 4. Find w˜ such that kw˜ k0 ≤ T, kw˜ k1 = 1 and kUw − Uw˜k∞ = O(p lg(2 + n/T)/T). 5. Let π(j) denote the index of the j th non-zero entry of w˜ . 6. Return hypotheses hπ(1), . . . , hπ(kw˜ k0) with weights w˜π(1), . . . , w˜π(kw˜ k0).
Open Source Code Yes A Python implementation of Theorem 2.2 can be found at: https://github.com/AlgoAU/DiscMin
Open Datasets Yes For simplicity we consider a single dataset in this section, the Flight Delay dataset (air)... Flight delay data. https://github.com/szilard/benchm-ml#data. ... Inspired by the experiments in (Wang et al., 2008; Ke et al., 2017; Chen & Guestrin, 2016) we also performed the above experiments on the Higgs (Whiteson, 2014) and Letter (Dheeru & Karra Taniskidou, 2017) datasets.
Dataset Splits No The paper mentions using a 'test set' for evaluation but does not specify the dataset split percentages, sample counts, or whether a validation set was used, nor does it cite a predefined split for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions 'Light GBM' and 'Python implementation' but does not specify exact version numbers for these or any other software dependencies required for reproducibility.
Experiment Setup Yes We train a classifier with T = 500 hypotheses using Light GBM (Ke et al., 2017) which we sparsify using Theorem 2.2 to have T = 80 hypotheses. ... The plot depicts test AUC and test classification accuracy of a Light GBM classifier during training as the number of hypotheses increase (in blue).