reproducibilityindex.ai

IPBoost – Non-Convex Boosting via Integer Programming

Authors: Marc Pfetsch, Sebastian Pokutta

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report results that are comparable to or better than the current state-of-the-art. We present computational results demonstrating that IP-based boosting can avoid the bad examples of (Long & Servedio, 2008): by far better solutions can be obtained via LP/IP-based boosting for these instances. We also show that IP-based boosting can be competitive for real-world instances from the LIBSVM data set.
Researcher Affiliation	Academia	1Department of Mathematics, TU Darmstadt, Germany 2Department of Mathematics, TU Berlin and Zuse Institute Berlin, Berlin, Germany.
Pseudocode	Yes	Algorithm 1 IPBoost
Open Source Code	Yes	The code is available through the web pages of the authors.
Open Datasets	Yes	We use classiﬁcation instances from the LIBSVM data sets available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
Dataset Splits	No	Note that we randomly split off 20 % of the points for the test set, and recall that we report the averages of 10 runs. For the other 24, we randomly split off 20% of the points as a test set. The paper describes train and test splits, but does not explicitly detail a separate validation split.
Hardware Specification	Yes	All tests were run on a Linux cluster with Intel Xeon quad core CPUs with 3.50GHz, 10 MB cache, and 32 GB of main memory.
Software Dependencies	Yes	We used a prerelease version of SCIP 7.0.0 with So Plex 5.0.0 as LP-solver (Gamrath et al., 2020) and the python framework scikit-learn (Pedregosa et al., 2011) and Ada Boost implementation in version 0.21.3 of scikit-learn.
Experiment Setup	Yes	We use the decision tree implementation of scikit-learn with a maximal depth of 1, i.e., a decision stump, as base learners for all boosters. We performed 10 runs for each instance with varying random seeds and Note that we use a time limit of one hour for each run of IPBoost. subsampling 30 000 points if their number N is larger than this threshold. Another crucial choice in our approach is the margin bound ρ. We ran our code with different values the aggregated results are presented in Table 2.