reproducibilityindex.ai

An Efficient Tester-Learner for Halfspaces

Authors: Aravind Gollakota, Adam Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (RV23). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is the standard Gaussian in d dimensions and the label noise is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error opt+ϵ (and extends to any fixed strongly log-concave target distribution). For adversarial noise, our tester-learner obtains error O(opt) + ϵ in polynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of (GKK23). This enables us to implement a testable variant of the algorithm of (DKTZ20a; DKTZ20b) for learning noisy halfspaces using nonconvex SGD.
Researcher Affiliation	Collaboration	Aravind Gollakota Apple Adam R. Klivans UT Austin Konstantinos Stavropoulos UT Austin Arsen Vasilyan MIT
Pseudocode	Yes	Algorithm 1: Tester-learner for halfspaces Input: Training sets S1, S2, parameters σ, δ, α Output: A near-optimal weight vector w, or rejection Run PSGD on the empirical loss Lσ over S1 to get a list L of candidate vectors. Test whether L contains an α-approximate stationary point w of the empirical loss Lσ over S2. Reject if no such w exists. for each candidate w in {w, w} do Let Bw (σ) denote the band {x : \| w , x \| σ}. Let Fw denote the class of functions of at most two halfspaces with weights orthogonal to w . Let δ = Θ(δ). Run T1(S2, k = 2, δ) to verify that the empirical marginal is approximately isotropic. Reject if T1 rejects. Run T2(S2, w , σ, δ ) to verify that PS[Bw (σ)] = Θ(σ). Reject if T2 rejects. Run T3(S2, w , σ = σ/6, τ, δ ) and T3(S, w , σ = σ/2, τ, δ ) for a suitable constant τ to verify that the empirical distribution conditioned on Bw (σ/6) and Bw (σ/2) fools Fw up to τ. Reject if T3 rejects. Estimate the empirical error of w on S. If all tests have accepted, output w {w, w} with the best empirical error.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper discusses abstract target distributions like 'the standard Gaussian in d dimensions' and 'isotropic strongly log-concave distribution' but does not specify a publicly available dataset with concrete access information for training.
Dataset Splits	No	The paper discusses 'training sets S1, S2' but does not provide specific dataset split percentages or counts for training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'nonconvex SGD' and 'PSGD' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	No	The paper discusses the overall approach like 'running SGD on the empirical loss Lσ' and 'appropriate selection of parameters' for sigma, but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings.