Multiclass Boosting and the Cost of Weak Learning
Authors: Nataly Brukhim, Elad Hazan, Shay Moran, Indraneel Mukherjee, Robert E. Schapire
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work we study multiclass boosting with a possibly large number of classes or categories. We find that the boosting algorithm itself only requires O(log k) samples, as we show by analyzing a variant of Ada Boost for our setting. In stark contrast, assuming typical limits on the number of weak-learner calls, we prove that the number of samples required by a weak learner is at least polynomial in k, exponentially more than the number of samples needed by the booster. Alternatively, we prove that the weak learner s accuracy parameter must be smaller than an inverse polynomial in k, showing that the returned weak hypotheses must be nearly the best in their class when k is large. We also prove a trade-off between number of oracle calls and the resources required of the weak learner, meaning that the fewer calls to the weak learner the more that is demanded on each call. |
| Researcher Affiliation | Collaboration | Nataly Brukhim Princeton University Google AI Princeton nbrukhim@princeton.edu Elad Hazan Princeton University Google AI Princeton ehazan@princeton.edu Shay Moran Technion Google Research smoran@technion.ac.il Indraneel Mukherjee indraneel.mukherjee@protonmail.com Robert E. Schapire Microsoft Research schapire@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Multiclass Adaboost |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes theoretical concepts and calculations of sample complexity (e.g., mw = O(k2)) but does not mention the use of a specific, publicly available dataset or provide access information for one. |
| Dataset Splits | No | The paper does not specify any dataset splits (e.g., percentages or counts for training, validation, or test sets) as it is a theoretical work. |
| Hardware Specification | No | The paper does not provide specific details on hardware used for experiments, as it is a theoretical work and does not describe empirical evaluations requiring such specifications. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers, as it is a theoretical work and does not detail a software implementation requiring such specifications. |
| Experiment Setup | No | The paper does not provide specific details about an experimental setup, such as hyperparameters or training configurations, as it is a theoretical work. |