Wide Two-Layer Networks can Learn from Adversarial Perturbations

Authors: Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A comprehensive set of experiments conducted to validate our theorems can be found in Appendix B. In this section, we briefly present two results that confirm Theorem 3.4. As a training dataset D := {(xn, yn)}N n=1, we employed a synthetic training dataset to easily change the input dimension, which effectively helps perturbation learning in both scenarios, as predicted by our theorems. Note that the perturbation learning on real-world datasets can be found in the literature [22, 28]. We generated synthetic data and labels from the mean-shifted Gaussian distribution as follows: {xn}N n=1 are independently sampled from N(0.3 yn 1, I), and yn is set to one if n [N/2] and minus one otherwise. The experimental settings are as follows: d = 100, N = 1, 000, m = 100, γ = 0, ℓ(s) := s, ϵ = 0.01, and the number of training steps is set to 1,000 for both f and g. The experimental results for perturbation learning under Scenario (a) are shown in Fig. 3. A high input dimension facilitates the alignment between f and g. Our theoretical results assume a wide network width, and Fig. 3 indicates that a sufficiently large width consistently stabilize the alignment.
Researcher Affiliation Academia Soichiro Kumano The University of Tokyo kumano@cvm.t.u-tokyo.ac.jp Hiroshi Kera Chiba University, Zuse Institute Berlin kera@chiba-u.jp Toshihiko Yamasaki The University of Tokyo yamasaki@cvm.t.u-tokyo.ac.jp
Pseudocode No The paper describes methods in prose and mathematical formulations but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/s-kumano/perturbation-learning.
Open Datasets Yes We utilized two synthetic datasets and two widely used datasets, MNIST [11] and Fashion MNIST [45]. The first synthetic dataset is derived from a zero-mean Gaussian distribution: {xn}N n=1 are independently sampled from N(0, I) and {yn}N n=1 are independently sampled from U({ 1}). The second synthetic dataset is based on a mean-shifted Gaussian distribution: {xn}N n=1 are independently sampled from N(0.3 yn 1, I) and yn is set to one for n [N/2] and minus one otherwise. We used data only from classes 1 and 2 in MNIST (i.e., digits 1 and 2) and those from classes 0 and 9 in Fashion-MNIST (i.e., T-shirt and ankle boot).
Dataset Splits No The paper mentions training, test datasets, but no explicit validation split details are provided.
Hardware Specification Yes Our experiments were conducted on an NVIDIA A100.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The experimental settings are as follows: d = 100, N = 1, 000, m = 100, γ = 0, ℓ(s) := s, ϵ = 0.01, and the number of training steps is set to 1,000 for both f and g. ... We used non-stochastic gradient descent (i.e., each gradient calculation uses the entire dataset) with 0.9 momentum and the learning rate scheduler that multiplies a learning rate by 0.1 when a training loss has stopped improving during 10 epochs. For Figs. 3 to A12, we selected the best accuracy, agreement ratio, and cosine similarity from training with multiple initial learning rates.