The Many Faces of Optimal Weak-to-Strong Learning

Authors: Mikael Møller Høgsgaard, Kasper Green Larsen, Markus Engelund Mathiasen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition to this theoretical contribution, we also perform the first empirical comparison of the proposed sample optimal Boosting algorithms. Our pilot empirical study suggests that our new algorithm might outperform previous algorithms on large data sets.
Researcher Affiliation Academia Mikael Møller Høgsgaard Department of Computer Science Aarhus University hogsgaards@cs.au.dk Kasper Green Larsen Department of Computer Science Aarhus University larsen@cs.au.dk Markus Engelund Mathiasen Department of Computer Science Aarhus University markusm@cs.au.dk
Pseudocode Yes Algorithm 1: MAJORITY-OF-5(S, W) Input: Training set S = (x1, y1), . . . , (xm, ym). Weak learner W. Result: Hypothesis g : X { 1, 1}. 1 Partition S into 5 disjoint pieces S1, . . . , S5 of size m/5. 2 for t = 1, . . . , 5 do 3 Run Ada Boost on St with W to obtain ft : X { 1, 1}.
Open Source Code Yes Justification: The algorithms have been described in detail, and the parameters for the experiments are given in the article. Furthermore, the code used for the experiments is provided.
Open Datasets Yes Higgs [29]: This data set represents measurements from particle detectors, and the labels tells whether they come from a process producing Higgs bosons or if they were a background process. The data set consists of 11 million labeled samples. However, we focus on the first 300,000 samples. Each sample consists of 28 features, where 7 of these are derived from the other 21. Boone [23]: In this data set, we try to distinguish electron neutrinos from muon neutrinos. The data set consists of 130,065 labeled samples. Each sample consists of 50 features. Forest Cover [4]: In this data set, we try to determine the forest cover type of 30 x 30 meter cells. The data set actually has 7 different forest cover types, so we have removed all samples of the 5 most uncommon to make it into a binary classification problem. This leaves us with 495,141 samples. Each sample consists of 54 features such as elevation, soil-type and more. Diabetes [28]: In this data set, we try to determine whether a patient has diabetes or not from features such as BMI, insulin level, age and so on. This is the smallest real-world data set, consisting of only 768 samples. Each sample consists of 8 features. Adversarial [15]: This data set, as well as the weak learner, have been developed using the lower bound instance in [15].
Dataset Splits No For all real world data sets, we have shuffled the samples and randomly set aside 20% to use as test set.
Hardware Specification No The paper does not mention any specific hardware (e.g., GPU/CPU models, memory, cloud instances) used for its experiments.
Software Dependencies No The weak learner we use for these is the scikit-learn Decision Tree Classifier with max_depth=1. This is default for the implementation of Ada Boost in scikit-learn, which is the implementation used in our experiments.
Experiment Setup Yes The weak learner we use for these is the scikit-learn Decision Tree Classifier with max_depth=1. This is default for the implementation of Ada Boost in scikit-learn, which is the implementation used in our experiments. ... Each of these voting classifiers is then trained for 300 rounds on its respective input. ... For BAGGEDADABOOST, we have chosen to sample 95% of the samples (with replacement) in our experiments.