reproducibilityindex.ai

Label Efficient Learning by Exploiting Multi-Class Output Codes

Authors: Maria Balcan, Travis Dick, Yishay Mansour

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes. Rather than studying the behavior of these techniques for supervised learning, we establish a connection between the success of these methods and the existence of labelefﬁcient learning procedures. We show that in both the realizable and agnostic cases, if output codes are successful at learning from labeled data, they implicitly assume structure on how the classes are related. By making that structure explicit, we design learning algorithms to recover the classes with low label complexity. We provide results for the commonly studied cases of one-vs-all learning and when the codewords of the classes are well separated. We additionally consider the more challenging case where the codewords are not well separated, but satisfy a boundary features condition that captures the natural intuition that every bit of the codewords should be signiﬁcant. Full proofs of all our results can be found in the full version of the paper (Balcan, Dick, and Mansour 2016), while we present the modeling and technical ﬂow of ideas here.
Researcher Affiliation	Collaboration	Maria Florina Balcan School of Computer Science Carnegie Mellon University ninamf@cs.cmu.edu Travis Dick School of Computer Science Carnegie Mellon University tdick@cs.cmu.edu Yishay Mansour Microsoft Research and Tel Aviv University mansour@tau.ac.il
Pseudocode	Yes	Algorithm 1: Single-linkage learning. Algorithm 2: Hierarchical single-linkage learning. Algorithm 3: Robust single-linkage learning. Algorithm 4: Plane-detection algorithm.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the methodology described, nor does it include links to a code repository.
Open Datasets	No	The paper does not mention the use of specific, named datasets for training or provide any information about their public availability or source. It discusses abstract data distributions like P on X Rd.
Dataset Splits	No	As a theoretical paper, it does not describe specific training, validation, or test dataset splits or their sizes. It discusses theoretical sample complexities.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware components (e.g., GPU/CPU models, memory specifications) used for running experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies or their version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup	No	The paper is theoretical and does not provide specific experimental setup details such as hyperparameter values, training schedules, or system-level configurations.