A Characterization of Semi-Supervised Adversarially Robust PAC Learnability

Authors: Idan Attias, Steve Hanneke, Yishay Mansour

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model. We address the question of how many labeled and unlabeled examples are required to ensure learning. We show that having enough unlabeled data (the size of a labeled sample that a fully-supervised method would require), the labeled sample complexity can be arbitrarily smaller compared to previous works, and is sharply characterized by a different complexity measure. We prove nearly matching upper and lower bounds on this sample complexity.
Researcher Affiliation Collaboration Idan Attias Department of Computer Science Ben-Gurion University of the Negev idanatti@post.bgu.ac.il Steve Hanneke Department of Computer Science Purdue University steve.hanneke@gmail.com Yishay Mansour Blavatnik School of Computer Science Tel Aviv University and Google Research mansour.yishay@gmail.com
Pseudocode Yes Algorithm 1 Generic Adversarially-Robust Semi-Supervised (GRASS) learner Input: Labeled data set Sl Dml, unlabeled data set Su X , hypothesis class H, perturbation function U, parameters , δ. Algorithms used: PAC learner A for partial concept classes, agnostic adversarially robust supervised PAC learner B.
Open Source Code No The paper is theoretical and does not present empirical experiments, therefore no source code for the methodology is provided. The ethics statement explicitly marks 'Did you include the code, data, and instructions needed to reproduce the main experimental results' as N/A.
Open Datasets No The paper is theoretical and does not involve experiments with specific public datasets. It discusses theoretical concepts of samples and distributions (e.g., 'labeled and unlabeled examples drawn i.i.d. from unknown distribution D'), but does not provide access information for any public dataset used for training.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments with dataset splits. It defines theoretical sample complexity in terms of 'labeled examples' and 'unlabeled examples' but does not specify empirical training/validation/test splits.
Hardware Specification No The paper is theoretical and does not describe any hardware used for experiments. The ethics statement explicitly marks relevant questions (3d) as N/A.
Software Dependencies No The paper is theoretical and does not describe any software dependencies with specific version numbers. The ethics statement explicitly marks relevant questions (3a) as N/A regarding reproducibility of experimental results.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and algorithms. It does not provide specific experimental setup details, hyperparameters, or training configurations. The ethics statement marks questions regarding experimental details as N/A.