Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Authors: Eugene Golikov

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove a convergence theorem for it, and show that it provides a more reasonable approximation for finite-width nets compared to the NTK limit if learning rates are not very small. Also, our framework suggests a limit model that coincides neither with the MF limit nor with the NTK one. We show that for networks with more than two hidden layers RMSProp training has a non-trivial discrete-time MF limit but GD training does not have one. Overall, our framework demonstrates that both MF and NTK limits have considerable limitations in approximating finite-sized neural nets, indicating the need for designing more accurate infinite-width approximations for them.Figure 1. MF, NTK and intermediate scalings result in non-trivial limit models for a single layer neural net. ... We train a 1-hidden layer net on a subset of CIFAR2 (a dataset of the first two classes of CIFAR10) of size 1000 with gradient descent.
Researcher Affiliation Academia 1Neural Networks and Deep Learning lab., Moscow Institute of Physics and Technology, Moscow, Russia. Correspondence to: Eugene A. Golikov <golikov.ea@mipt.ru>.
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code No No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper is provided.
Open Datasets Yes We train a 1-hidden layer net on a subset of CIFAR2 (a dataset of the first two classes of CIFAR10) of size 1000 with gradient descent.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning is provided. The paper only mentions using a 'subset of CIFAR2 (a dataset of the first two classes of CIFAR10) of size 1000'.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments are provided.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment are provided.
Experiment Setup Yes Setup: We train a 1-hidden layer net on a subset of CIFAR2 (a dataset of the first two classes of CIFAR10) of size 1000 with gradient descent. We take a reference net of width d = 27 = 128 trained with unscaled reference learning rates η a = η w = 0.02 and scale its hyperparameters according to MF (blue curves), NTK (orange curves), and intermediate scaling with qσ = 3/4 (green curves, see text). (from Figure 1 caption) and ...trained with (unscaled) reference learning rates η a = η w = 0.02 for GD and η a = η w = 0.0002 for RMSProp... (from Figure 2 caption).