Gradient Methods Provably Converge to Non-Robust Networks

Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with an empirical study. As we already mentioned, a limitation of our negative result is that it applies to the case where the size of the dataset is smaller than the input dimension. We show empirically that the same small perturbation from our negative result is also able to change the labels of almost all the examples in the dataset, even when it is much larger than the input dimension. In addition, our theoretical negative result holds regardless of the width of the network. We demonstrate it empirically, by showing that changing the width does not change the size of the minimal perturbation that flips the labels of the examples in the dataset.
Researcher Affiliation Academia Gal Vardi TTI-Chicago and Hebrew University galvardi@ttic.edu Gilad Yehudai Weizmann Institute of Science gilad.yehudai@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No We haven t included the code, since we only ran simple simulations, but provided all the details about them for reproduction.
Open Datasets No In all of our experiments we sampled (x, y) Rd { 1, 1} where x U(d Sd 1) and y is uniform on { 1, 1}. We also tested on x sampled from a Gaussian distribution with variance 1 d and obtained similar results. Here we only report the results on the uniform distribution.
Dataset Splits No The paper does not specify explicit training, validation, and test splits with percentages, sample counts, or references to predefined splits.
Hardware Specification No We did not use significant computing resources (such as GPUs, clusters or cloud services).
Software Dependencies No We implemented our experiments using Py Torch (Paszke et al. [2019]).
Experiment Setup Yes In all of our experiments we trained a depth-2 fully-connected neural network with Re LU activations using SGD with a batch size of 5, 000. [...] We used the exponential loss [...]. We began training with a learning rate of 10 5 and increased it by a factor of 1.1 every 100 iterations. We finished training after we achieved a loss smaller than 10 30.