Multiclass Boosting and the Cost of Weak Learning

Authors: Nataly Brukhim, Elad Hazan, Shay Moran, Indraneel Mukherjee, Robert E. Schapire

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work we study multiclass boosting with a possibly large number of classes or categories. We find that the boosting algorithm itself only requires O(log k) samples, as we show by analyzing a variant of Ada Boost for our setting. In stark contrast, assuming typical limits on the number of weak-learner calls, we prove that the number of samples required by a weak learner is at least polynomial in k, exponentially more than the number of samples needed by the booster. Alternatively, we prove that the weak learner s accuracy parameter must be smaller than an inverse polynomial in k, showing that the returned weak hypotheses must be nearly the best in their class when k is large. We also prove a trade-off between number of oracle calls and the resources required of the weak learner, meaning that the fewer calls to the weak learner the more that is demanded on each call.
Researcher Affiliation Collaboration Nataly Brukhim Princeton University Google AI Princeton nbrukhim@princeton.edu Elad Hazan Princeton University Google AI Princeton ehazan@princeton.edu Shay Moran Technion Google Research smoran@technion.ac.il Indraneel Mukherjee indraneel.mukherjee@protonmail.com Robert E. Schapire Microsoft Research schapire@microsoft.com
Pseudocode Yes Algorithm 1 Multiclass Adaboost
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes theoretical concepts and calculations of sample complexity (e.g., mw = O(k2)) but does not mention the use of a specific, publicly available dataset or provide access information for one.
Dataset Splits No The paper does not specify any dataset splits (e.g., percentages or counts for training, validation, or test sets) as it is a theoretical work.
Hardware Specification No The paper does not provide specific details on hardware used for experiments, as it is a theoretical work and does not describe empirical evaluations requiring such specifications.
Software Dependencies No The paper does not specify software dependencies with version numbers, as it is a theoretical work and does not detail a software implementation requiring such specifications.
Experiment Setup No The paper does not provide specific details about an experimental setup, such as hyperparameters or training configurations, as it is a theoretical work.