Wide Two-Layer Networks can Learn from Adversarial Perturbations
Authors: Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A comprehensive set of experiments conducted to validate our theorems can be found in Appendix B. In this section, we briefly present two results that confirm Theorem 3.4. As a training dataset D := {(xn, yn)}N n=1, we employed a synthetic training dataset to easily change the input dimension, which effectively helps perturbation learning in both scenarios, as predicted by our theorems. Note that the perturbation learning on real-world datasets can be found in the literature [22, 28]. We generated synthetic data and labels from the mean-shifted Gaussian distribution as follows: {xn}N n=1 are independently sampled from N(0.3 yn 1, I), and yn is set to one if n [N/2] and minus one otherwise. The experimental settings are as follows: d = 100, N = 1, 000, m = 100, γ = 0, ℓ(s) := s, ϵ = 0.01, and the number of training steps is set to 1,000 for both f and g. The experimental results for perturbation learning under Scenario (a) are shown in Fig. 3. A high input dimension facilitates the alignment between f and g. Our theoretical results assume a wide network width, and Fig. 3 indicates that a sufficiently large width consistently stabilize the alignment. |
| Researcher Affiliation | Academia | Soichiro Kumano The University of Tokyo kumano@cvm.t.u-tokyo.ac.jp Hiroshi Kera Chiba University, Zuse Institute Berlin kera@chiba-u.jp Toshihiko Yamasaki The University of Tokyo yamasaki@cvm.t.u-tokyo.ac.jp |
| Pseudocode | No | The paper describes methods in prose and mathematical formulations but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/s-kumano/perturbation-learning. |
| Open Datasets | Yes | We utilized two synthetic datasets and two widely used datasets, MNIST [11] and Fashion MNIST [45]. The first synthetic dataset is derived from a zero-mean Gaussian distribution: {xn}N n=1 are independently sampled from N(0, I) and {yn}N n=1 are independently sampled from U({ 1}). The second synthetic dataset is based on a mean-shifted Gaussian distribution: {xn}N n=1 are independently sampled from N(0.3 yn 1, I) and yn is set to one for n [N/2] and minus one otherwise. We used data only from classes 1 and 2 in MNIST (i.e., digits 1 and 2) and those from classes 0 and 9 in Fashion-MNIST (i.e., T-shirt and ankle boot). |
| Dataset Splits | No | The paper mentions training, test datasets, but no explicit validation split details are provided. |
| Hardware Specification | Yes | Our experiments were conducted on an NVIDIA A100. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The experimental settings are as follows: d = 100, N = 1, 000, m = 100, γ = 0, ℓ(s) := s, ϵ = 0.01, and the number of training steps is set to 1,000 for both f and g. ... We used non-stochastic gradient descent (i.e., each gradient calculation uses the entire dataset) with 0.9 momentum and the learning rate scheduler that multiplies a learning rate by 0.1 when a training loss has stopped improving during 10 epochs. For Figs. 3 to A12, we selected the best accuracy, agreement ratio, and cosine similarity from training with multiple initial learning rates. |