IPBoost – Non-Convex Boosting via Integer Programming

Authors: Marc Pfetsch, Sebastian Pokutta

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report results that are comparable to or better than the current state-of-the-art. We present computational results demonstrating that IP-based boosting can avoid the bad examples of (Long & Servedio, 2008): by far better solutions can be obtained via LP/IP-based boosting for these instances. We also show that IP-based boosting can be competitive for real-world instances from the LIBSVM data set.
Researcher Affiliation Academia 1Department of Mathematics, TU Darmstadt, Germany 2Department of Mathematics, TU Berlin and Zuse Institute Berlin, Berlin, Germany.
Pseudocode Yes Algorithm 1 IPBoost
Open Source Code Yes The code is available through the web pages of the authors.
Open Datasets Yes We use classification instances from the LIBSVM data sets available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
Dataset Splits No Note that we randomly split off 20 % of the points for the test set, and recall that we report the averages of 10 runs. For the other 24, we randomly split off 20% of the points as a test set. The paper describes train and test splits, but does not explicitly detail a separate validation split.
Hardware Specification Yes All tests were run on a Linux cluster with Intel Xeon quad core CPUs with 3.50GHz, 10 MB cache, and 32 GB of main memory.
Software Dependencies Yes We used a prerelease version of SCIP 7.0.0 with So Plex 5.0.0 as LP-solver (Gamrath et al., 2020) and the python framework scikit-learn (Pedregosa et al., 2011) and Ada Boost implementation in version 0.21.3 of scikit-learn.
Experiment Setup Yes We use the decision tree implementation of scikit-learn with a maximal depth of 1, i.e., a decision stump, as base learners for all boosters. We performed 10 runs for each instance with varying random seeds and Note that we use a time limit of one hour for each run of IPBoost. subsampling 30 000 points if their number N is larger than this threshold. Another crucial choice in our approach is the margin bound ρ. We ran our code with different values the aggregated results are presented in Table 2.