Gradient Methods Provably Converge to Non-Robust Networks
Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with an empirical study. As we already mentioned, a limitation of our negative result is that it applies to the case where the size of the dataset is smaller than the input dimension. We show empirically that the same small perturbation from our negative result is also able to change the labels of almost all the examples in the dataset, even when it is much larger than the input dimension. In addition, our theoretical negative result holds regardless of the width of the network. We demonstrate it empirically, by showing that changing the width does not change the size of the minimal perturbation that flips the labels of the examples in the dataset. |
| Researcher Affiliation | Academia | Gal Vardi TTI-Chicago and Hebrew University galvardi@ttic.edu Gilad Yehudai Weizmann Institute of Science gilad.yehudai@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | We haven t included the code, since we only ran simple simulations, but provided all the details about them for reproduction. |
| Open Datasets | No | In all of our experiments we sampled (x, y) Rd { 1, 1} where x U(d Sd 1) and y is uniform on { 1, 1}. We also tested on x sampled from a Gaussian distribution with variance 1 d and obtained similar results. Here we only report the results on the uniform distribution. |
| Dataset Splits | No | The paper does not specify explicit training, validation, and test splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | We did not use significant computing resources (such as GPUs, clusters or cloud services). |
| Software Dependencies | No | We implemented our experiments using Py Torch (Paszke et al. [2019]). |
| Experiment Setup | Yes | In all of our experiments we trained a depth-2 fully-connected neural network with Re LU activations using SGD with a batch size of 5, 000. [...] We used the exponential loss [...]. We began training with a learning rate of 10 5 and increased it by a factor of 1.1 every 100 iterations. We finished training after we achieved a loss smaller than 10 30. |