Feature Learning in $L_2$-regularized DNNs: Attraction/Repulsion and Sparsity

Authors: Arthur Jacot, Eugene Golikov, Clement Hongler, Franck Gabriel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this bound is tight by giving an example of a local minimum that requires N 2/4 hidden neurons. But we also observe numerically that in more traditional settings much less than N 2 neurons are required to reach the minima. ... Figure 2: Loss plateau: Plots of the train loss (full lines) and test loss (dashed lines) as a function of the width for depth L = 3 DNNs for different datasets: (left) cross-entropy loss for a subset of MNIST (N = 1000) and two values of λ; (right) MSE with λ = 10 6 on N = 1000 Gaussian inputs and outputs evaluated on a fixed teacher network of depth L = 3 and width 10( (right).
Researcher Affiliation Academia Arthur Jacot Courant Institute of Mathematical Sciences New York University arthur.jacot@nyu.edu; Eugene Golikov Chair of Statistical Field Theory École Polytechnique Fédérale de Lausanne evgenii.golikov@epfl.ch; Clément Hongler Chair of Statistical Field Theory École Polytechnique Fédérale de Lausanne clement.hongler@epfl.ch; Franck Gabriel Institut de Science Financière et d Assurances Université Lyon 1 franckr.gabriel@gmail.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes Figure 1: Attraction/Repulsion: Visualization of the hidden representation Z1 and Z2 of a L = 3 Re LU DNN at the end of training (i.e. after T = 20k steps of gradient descent on the original loss Lλ) on 3 digits (7,8 and 9) of MNIST [15]... Figure 2: ... (left) cross-entropy loss for a subset of MNIST (N = 1000)...
Dataset Splits No The paper mentions training and testing, but does not specify explicit training/validation/test dataset splits, percentages, or sample counts for a validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models or CPU types.
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup Yes Figure 1: ... L = 3 Re LU DNN at the end of training (i.e. after T = 20k steps of gradient descent on the original loss Lλ)... Figure 2: ...depth L = 3 DNNs... (left) cross-entropy loss for a subset of MNIST (N = 1000) and two values of λ; (right) MSE with λ = 10 6 on N = 1000 Gaussian inputs and outputs...