Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient active learning of sparse halfspaces with arbitrary bounded noise
Authors: Chicheng Zhang, Jie Shen, Pranjal Awasthi
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study active learning of homogeneous s-sparse halfspaces in Rd under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most for a parameter 2 , known as the bounded noise. This is the first efficient algorithm with label complexity polynomial in 1 1 2 in this setting, which is label-efficient even for arbitrarily close to 1 2. Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise. We develop an attribute-efficient learning algorithm that runs in polynomial time, and achieves a label complexity of O s (1 2 )4 polylog provided that the underlying Bayes classifier is an s-sparse halfspace. Theorem 2 (Main result). Suppose Algorithm 1 is run under a distribution D such that Assumptions 1 and 2 are satisfied. Then with probability 1 δ, it returns a halfspace u such that err(h u, D) err(hu, D) . Moreover, our algorithm tolerates any noise rate 2 [0, 1/2), and asks for a total of O # s (1 2 )4 polylog labeled examples. |
| Researcher Affiliation | Collaboration | Chicheng Zhang University of Arizona EMAIL Jie Shen Stevens Institute of Technology EMAIL Pranjal Awasthi Google Research and Rutgers University EMAIL |
| Pseudocode | Yes | Algorithm 1 Main algorithm, Algorithm 2 REFINE, Algorithm 3 INITIALIZE |
| Open Source Code | No | The paper does not mention releasing open-source code or provide any links to a code repository. |
| Open Datasets | No | The paper operates under theoretical assumptions about data distributions (e.g., "isotropic log-concave distribution") and does not refer to specific, named, publicly available datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental dataset splits for training, validation, or testing, nor does it refer to standard predefined splits for any dataset. |
| Hardware Specification | No | The paper focuses on theoretical algorithm design and complexity analysis. It does not describe any specific hardware (e.g., GPU/CPU models, memory) used to run experiments. |
| Software Dependencies | No | The paper describes algorithms and theoretical guarantees. It does not mention any specific software dependencies or their version numbers that would be needed to reproduce experiments. |
| Experiment Setup | No | The paper details algorithmic parameters (e.g., step sizes, bandwidth, number of iterations) within the pseudo-code and theorems, but these are derived theoretically in terms of other parameters (d, s, epsilon, delta, k). It does not provide concrete, numerical hyperparameter values or system-level training settings characteristic of an empirical experimental setup. |