Feature Learning in $L_2$-regularized DNNs: Attraction/Repulsion and Sparsity
Authors: Arthur Jacot, Eugene Golikov, Clement Hongler, Franck Gabriel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this bound is tight by giving an example of a local minimum that requires N 2/4 hidden neurons. But we also observe numerically that in more traditional settings much less than N 2 neurons are required to reach the minima. ... Figure 2: Loss plateau: Plots of the train loss (full lines) and test loss (dashed lines) as a function of the width for depth L = 3 DNNs for different datasets: (left) cross-entropy loss for a subset of MNIST (N = 1000) and two values of λ; (right) MSE with λ = 10 6 on N = 1000 Gaussian inputs and outputs evaluated on a fixed teacher network of depth L = 3 and width 10( (right). |
| Researcher Affiliation | Academia | Arthur Jacot Courant Institute of Mathematical Sciences New York University arthur.jacot@nyu.edu; Eugene Golikov Chair of Statistical Field Theory École Polytechnique Fédérale de Lausanne evgenii.golikov@epfl.ch; Clément Hongler Chair of Statistical Field Theory École Polytechnique Fédérale de Lausanne clement.hongler@epfl.ch; Franck Gabriel Institut de Science Financière et d Assurances Université Lyon 1 franckr.gabriel@gmail.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | Figure 1: Attraction/Repulsion: Visualization of the hidden representation Z1 and Z2 of a L = 3 Re LU DNN at the end of training (i.e. after T = 20k steps of gradient descent on the original loss Lλ) on 3 digits (7,8 and 9) of MNIST [15]... Figure 2: ... (left) cross-entropy loss for a subset of MNIST (N = 1000)... |
| Dataset Splits | No | The paper mentions training and testing, but does not specify explicit training/validation/test dataset splits, percentages, or sample counts for a validation set. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models or CPU types. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | Figure 1: ... L = 3 Re LU DNN at the end of training (i.e. after T = 20k steps of gradient descent on the original loss Lλ)... Figure 2: ...depth L = 3 DNNs... (left) cross-entropy loss for a subset of MNIST (N = 1000) and two values of λ; (right) MSE with λ = 10 6 on N = 1000 Gaussian inputs and outputs... |