The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

Authors: Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All empirical figures in this work were generated by the attached notebook. Here we briefly summarize the experimental conditions used to generate these figures. ... Using these initializations and Sci Py s initial value problem ODE solver Virtanen et al. (2020) we then simulated gradient flow until T = 1e5. The final value of the classifier for both models and their respective robustness was recorded and used to generate the final plots.
Researcher Affiliation Academia Daniel Kunin & Atsushi Yamamura (山村篤志) Stanford University {kunin,atsushi3}@stanford.edu Chao Ma, Surya Ganguli Stanford University {chaoma,sganguli}@stanford.edu
Pseudocode No No, the paper describes theoretical concepts and proofs but does not include any pseudocode or algorithm blocks.
Open Source Code Yes All empirical figures in this work were generated by the attached notebook.
Open Datasets No Logistic Regression (Fig. 1). This plot was generated by sampling 100 sample from two Gaussian distributions N(µ, σI) in R2 where µ = [1/√2] and σ = 0.25. ... Ball Classification (Fig. 4). This plot was generated by sampling 1e4 random samples from the surface of two balls B(µ, r) in R3 for 100 linearly spaced radii r ∈ [0, 1].
Dataset Splits No No, the paper describes synthetic data generation and gradient flow simulations but does not specify any training/validation/test dataset splits.
Hardware Specification No No, the paper describes the simulation of gradient flow and the use of the SciPy library but does not specify any hardware details like CPU or GPU models.
Software Dependencies Yes Using these initializations and Sci Py s initial value problem ODE solver Virtanen et al. (2020) we then simulated gradient flow until T = 1e5. ... The maximum ℓ2-margin solution was computed using scikit-learn s SVM package Pedregosa et al. (2011).
Experiment Setup Yes The parameters were trained with full batch gradient descent with a learning rate η = 0.5 for 1e5 steps. ... Using these initializations and Sci Py s initial value problem ODE solver Virtanen et al. (2020) we then simulated gradient flow until T = 1e5.