Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Noisy Halfspaces with a Margin: Massart is No Harder than Random
Authors: Gautam Chandrasekaran, Vasilis Kontonis, Konstantinos Stavropoulos, Kevin Tian
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of PAC learning -margin halfspaces with Massart noise. We propose a simple proper learning algorithm, the Perspectron, that has sample complexity e O((ϵγ) 2) and achieves classification error at most η + ϵ where η is the Massart noise rate. Prior works [DGT19b, CKMY20] came with worse sample complexity guarantees (in both ϵ and γ) or could only handle random classification noise [DDK+23, KIT+23] a much milder noise assumption. We also show that our results extend to the more challenging setting of learning generalized linear models with a known link function under Massart noise, achieving a similar sample complexity to the halfspace case. This significantly improves upon the prior state-of-the-art in this setting due to [CKMY20], who introduced this model. |
| Researcher Affiliation | Academia | Gautam Chandrasekaran EMAIL UT Austin Vasilis Kontonis EMAIL UT Austin Konstantinos Stavropoulos EMAIL UT Austin Kevin Tian EMAIL UT Austin |
| Pseudocode | Yes | Algorithm 1: Perspectron |
| Open Source Code | No | The paper is theoretical and does not contain any statement about making its code open-source or providing a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments, therefore it does not refer to any dataset or its public availability for training purposes. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments or provide any information about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not include any experimental section, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not include an experimental setup or details on hyperparameters, as it does not conduct experiments. |