reproducibilityindex.ai

Gradient Methods Provably Converge to Non-Robust Networks

Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical results with an empirical study. As we already mentioned, a limitation of our negative result is that it applies to the case where the size of the dataset is smaller than the input dimension. We show empirically that the same small perturbation from our negative result is also able to change the labels of almost all the examples in the dataset, even when it is much larger than the input dimension. In addition, our theoretical negative result holds regardless of the width of the network. We demonstrate it empirically, by showing that changing the width does not change the size of the minimal perturbation that ﬂips the labels of the examples in the dataset.
Researcher Affiliation	Academia	Gal Vardi TTI-Chicago and Hebrew University galvardi@ttic.edu Gilad Yehudai Weizmann Institute of Science gilad.yehudai@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	We haven t included the code, since we only ran simple simulations, but provided all the details about them for reproduction.
Open Datasets	No	In all of our experiments we sampled (x, y) Rd { 1, 1} where x U(d Sd 1) and y is uniform on { 1, 1}. We also tested on x sampled from a Gaussian distribution with variance 1 d and obtained similar results. Here we only report the results on the uniform distribution.
Dataset Splits	No	The paper does not specify explicit training, validation, and test splits with percentages, sample counts, or references to predefined splits.
Hardware Specification	No	We did not use signiﬁcant computing resources (such as GPUs, clusters or cloud services).
Software Dependencies	No	We implemented our experiments using Py Torch (Paszke et al. [2019]).
Experiment Setup	Yes	In all of our experiments we trained a depth-2 fully-connected neural network with Re LU activations using SGD with a batch size of 5, 000. [...] We used the exponential loss [...]. We began training with a learning rate of 10 5 and increased it by a factor of 1.1 every 100 iterations. We ﬁnished training after we achieved a loss smaller than 10 30.