A Theory of PAC Learnability under Transformation Invariances

Authors: Han Shao, Omar Montasser, Avrim Blum

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study PAC learnability under transformation invariances in three settings according to different levels of realizability: (i) A hypothesis fits the augmented data; (ii) A hypothesis fits only the original data and the transformed data lying in the support of the data distribution; (iii) Agnostic case. One interesting observation is that distinguishing between the original data and the transformed data is necessary to achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating between the original and transformed data (including data augmentation) is not optimal. Furthermore, this type of algorithms can even harm the accuracy. In setting (i), although it is unnecessary to distinguish between the two data sets, data augmentation still does not perform optimally. Due to such a difference, we propose two combinatorial measures characterizing the optimal sample complexity in setting (i) and (ii)(iii) and provide the optimal algorithms.
Researcher Affiliation Academia Han Shao Toyota Technological Institute Chicago Chicago, 60637 han@ttic.edu Omar Montasser Toyota Technological Institute Chicago Chicago, 60637 omar@ttic.edu Avrim Blum Toyota Technological Institute Chicago Chicago, 60637 avrim@ttic.edu
Pseudocode No The paper describes algorithms (e.g., 1-inclusion-graph predictor, ERM-INV) but does not provide any structured pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described. The 'If you are including theoretical results...' section in the ethical considerations is marked 'N/A' for code.
Open Datasets No The paper is theoretical and does not report on experiments with datasets. The 'If you ran experiments...' section in the ethical considerations is marked 'N/A' for data.
Dataset Splits No The paper is theoretical and does not report on experiments, thus no dataset split information is provided.
Hardware Specification No The paper is theoretical and does not report on experiments. Therefore, no specific hardware details are provided. The 'If you ran experiments...' section in the ethical considerations is marked 'N/A' for compute.
Software Dependencies No The paper is theoretical and does not report on experiments. Therefore, no specific software dependencies with version numbers are listed.
Experiment Setup No The paper is theoretical and does not report on experiments. Therefore, no specific experimental setup details, such as hyperparameters or training configurations, are provided.