Optimal Minimal Margin Maximization with Boosting
Authors: Alexander Mathiasen, Kasper Green Larsen, Allan Grønlund
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we demonstrate how our sparsification algorithm can be combined with Gradient Boosting. For simplicity we consider a single dataset in this section, the Flight Delay dataset (air), see the full version (Grønlund et al., 2019) for similar results on other dataset. We train a classifier with T = 500 hypotheses using Light GBM (Ke et al., 2017) which we sparsify using Theorem 2.2 to have T = 80 hypotheses. The sparsified classifier is guaranteed to preserve all margins of the original classifier to an additive O(lg(n/T )/T ). The cumulative margins of the sparsified classifier and the original classifier are depicted in Figure 1. Furthermore, we also depict the cumulative margins of a Light GBM classifier trained to have T = 80 hypotheses. ... To investigate this, we performed additional experiments computing AUC and classification accuracy of several sparsified classifiers and Light GBM classifiers on a test set...The experiments indeed show that the sparsified classifiers outperform the Light GBM classifiers with the same number of hypotheses. See Figure 2 for test AUC and test classification accuracy. |
| Researcher Affiliation | Academia | Allan Grønlund * 1 Kasper Green Larsen * 1 Alexander Mathiasen * 1 1Department of Computer Science, University of Aarhus, Denmark. |
| Pseudocode | Yes | Algorithm 1 Sparsi Boost Input: Training data D = {(xi, yi)}n i=1 where xi ∈ X for some input space X and yi ∈ {−1, +1}. Target number of hypotheses T and base learning algorithm A. Output: Hypotheses h1, . . . , hk and weights w1, . . . , wk with k ≤ T, such that P i wihi has gap O(p lg(2 + n/T)/T) on D. 1. Run Ada Boost V with base learning algorithm A on training data D to get c · T hypotheses h1, . . . , hc·T and weights w1, . . . , wc·T for the integer c = dl g(n)/ lg(2 + n/T)e . 2. Construct margin matrix U ∈ [−1, +1]n×c·T where uij = yihj(xi). 3. Form the vector w with i th coordinate wi and normalize w ← w/kwk1 so kwk1 = 1. 4. Find w˜ such that kw˜ k0 ≤ T, kw˜ k1 = 1 and kUw − Uw˜k∞ = O(p lg(2 + n/T)/T). 5. Let π(j) denote the index of the j th non-zero entry of w˜ . 6. Return hypotheses hπ(1), . . . , hπ(kw˜ k0) with weights w˜π(1), . . . , w˜π(kw˜ k0). |
| Open Source Code | Yes | A Python implementation of Theorem 2.2 can be found at: https://github.com/AlgoAU/DiscMin |
| Open Datasets | Yes | For simplicity we consider a single dataset in this section, the Flight Delay dataset (air)... Flight delay data. https://github.com/szilard/benchm-ml#data. ... Inspired by the experiments in (Wang et al., 2008; Ke et al., 2017; Chen & Guestrin, 2016) we also performed the above experiments on the Higgs (Whiteson, 2014) and Letter (Dheeru & Karra Taniskidou, 2017) datasets. |
| Dataset Splits | No | The paper mentions using a 'test set' for evaluation but does not specify the dataset split percentages, sample counts, or whether a validation set was used, nor does it cite a predefined split for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions 'Light GBM' and 'Python implementation' but does not specify exact version numbers for these or any other software dependencies required for reproducibility. |
| Experiment Setup | Yes | We train a classifier with T = 500 hypotheses using Light GBM (Ke et al., 2017) which we sparsify using Theorem 2.2 to have T = 80 hypotheses. ... The plot depicts test AUC and test classification accuracy of a Light GBM classifier during training as the number of hypotheses increase (in blue). |