A Universal Law of Robustness via Isoperimetry
Authors: Sebastien Bubeck, Mark Sellke
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry (or a mixture thereof). |
| Researcher Affiliation | Collaboration | S ebastien Bubeck Microsoft Research sebubeck@microsoft.com Mark Sellke Stanford University msellke@stanford.edu |
| Pseudocode | No | The paper does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code or links to a code repository for the methodology described. |
| Open Datasets | Yes | To put Theorem 1 in context, we compare to the empirical results presented in [MMS+18]. In the latter work, they consider the MNIST dataset which consists of n = 6 104 images in dimension 282 = 784. |
| Dataset Splits | No | This paper is theoretical and focuses on proving a mathematical law. It discusses existing empirical results from other papers (e.g., [MMS+18]) but does not define or use its own dataset splits (train/validation/test) to reproduce experiments. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental implementation, thus no software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experiments with hyperparameters or training configurations conducted by the authors. |