Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Authors: Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our work proves convergence to low robust training loss for polynomial width and running time, instead of exponential, under natural assumptions and with Re LU activation. Key element of our proof is showing that Re LU networks near initialization can approximate the step function, which may be of independent interest.
Researcher Affiliation Academia Yi Zhang* Princeton University y.zhang@cs.princeton.edu Orestis Plevrakis Princeton University orestisp@cs.princeton.edu Simon S. Du University of Washington ssdu@cs.washington.edu Xingguo Li Princeton University xingguol@cs.princeton.edu Zhao Song Institute of Advanced Study zhao@ias.edu Sanjeev Arora Princeton University and IAS arora@cs.princeton.edu
Pseudocode Yes Algorithm 1 Adversarial training
Open Source Code No The paper focuses on theoretical analysis and does not provide an explicit statement or link for open-source code related to the described methodology.
Open Datasets Yes In Figure 1 we show that on CIFAR-10, other than probably a very small fraction of examples, all the others do not have too small minimum distance from any example.
Dataset Splits No The paper is theoretical and does not describe specific dataset splits (training, validation, test) for empirical evaluation.
Hardware Specification No The paper focuses on theoretical analysis and does not mention specific hardware used for any computations or empirical observations.
Software Dependencies No The paper focuses on theoretical analysis and does not list any specific software dependencies with version numbers.
Experiment Setup No While Theorem 4.1 mentions 'hyper-parameters T = Θ(ϵ 2R2) and η = Θ(ϵm 1/3)', these are theoretical bounds derived from the proof and not concrete, practical experimental setup values for a physical run.