reproducibilityindex.ai

Wide Two-Layer Networks can Learn from Adversarial Perturbations

Authors: Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A comprehensive set of experiments conducted to validate our theorems can be found in Appendix B. In this section, we briefly present two results that confirm Theorem 3.4. As a training dataset D := {(xn, yn)}N n=1, we employed a synthetic training dataset to easily change the input dimension, which effectively helps perturbation learning in both scenarios, as predicted by our theorems. Note that the perturbation learning on real-world datasets can be found in the literature [22, 28]. We generated synthetic data and labels from the mean-shifted Gaussian distribution as follows: {xn}N n=1 are independently sampled from N(0.3 yn 1, I), and yn is set to one if n [N/2] and minus one otherwise. The experimental settings are as follows: d = 100, N = 1, 000, m = 100, γ = 0, ℓ(s) := s, ϵ = 0.01, and the number of training steps is set to 1,000 for both f and g. The experimental results for perturbation learning under Scenario (a) are shown in Fig. 3. A high input dimension facilitates the alignment between f and g. Our theoretical results assume a wide network width, and Fig. 3 indicates that a sufficiently large width consistently stabilize the alignment.
Researcher Affiliation	Academia	Soichiro Kumano The University of Tokyo kumano@cvm.t.u-tokyo.ac.jp Hiroshi Kera Chiba University, Zuse Institute Berlin kera@chiba-u.jp Toshihiko Yamasaki The University of Tokyo yamasaki@cvm.t.u-tokyo.ac.jp
Pseudocode	No	The paper describes methods in prose and mathematical formulations but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/s-kumano/perturbation-learning.
Open Datasets	Yes	We utilized two synthetic datasets and two widely used datasets, MNIST [11] and Fashion MNIST [45]. The first synthetic dataset is derived from a zero-mean Gaussian distribution: {xn}N n=1 are independently sampled from N(0, I) and {yn}N n=1 are independently sampled from U({ 1}). The second synthetic dataset is based on a mean-shifted Gaussian distribution: {xn}N n=1 are independently sampled from N(0.3 yn 1, I) and yn is set to one for n [N/2] and minus one otherwise. We used data only from classes 1 and 2 in MNIST (i.e., digits 1 and 2) and those from classes 0 and 9 in Fashion-MNIST (i.e., T-shirt and ankle boot).
Dataset Splits	No	The paper mentions training, test datasets, but no explicit validation split details are provided.
Hardware Specification	Yes	Our experiments were conducted on an NVIDIA A100.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The experimental settings are as follows: d = 100, N = 1, 000, m = 100, γ = 0, ℓ(s) := s, ϵ = 0.01, and the number of training steps is set to 1,000 for both f and g. ... We used non-stochastic gradient descent (i.e., each gradient calculation uses the entire dataset) with 0.9 momentum and the learning rate scheduler that multiplies a learning rate by 0.1 when a training loss has stopped improving during 10 epochs. For Figs. 3 to A12, we selected the best accuracy, agreement ratio, and cosine similarity from training with multiple initial learning rates.