reproducibilityindex.ai

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Authors: Eugene Golikov

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove a convergence theorem for it, and show that it provides a more reasonable approximation for ﬁnite-width nets compared to the NTK limit if learning rates are not very small. Also, our framework suggests a limit model that coincides neither with the MF limit nor with the NTK one. We show that for networks with more than two hidden layers RMSProp training has a non-trivial discrete-time MF limit but GD training does not have one. Overall, our framework demonstrates that both MF and NTK limits have considerable limitations in approximating ﬁnite-sized neural nets, indicating the need for designing more accurate inﬁnite-width approximations for them.Figure 1. MF, NTK and intermediate scalings result in non-trivial limit models for a single layer neural net. ... We train a 1-hidden layer net on a subset of CIFAR2 (a dataset of the ﬁrst two classes of CIFAR10) of size 1000 with gradient descent.
Researcher Affiliation	Academia	1Neural Networks and Deep Learning lab., Moscow Institute of Physics and Technology, Moscow, Russia. Correspondence to: Eugene A. Golikov <golikov.ea@mipt.ru>.
Pseudocode	No	No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper is provided.
Open Datasets	Yes	We train a 1-hidden layer net on a subset of CIFAR2 (a dataset of the ﬁrst two classes of CIFAR10) of size 1000 with gradient descent.
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning is provided. The paper only mentions using a 'subset of CIFAR2 (a dataset of the ﬁrst two classes of CIFAR10) of size 1000'.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments are provided.
Software Dependencies	No	No specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment are provided.
Experiment Setup	Yes	Setup: We train a 1-hidden layer net on a subset of CIFAR2 (a dataset of the ﬁrst two classes of CIFAR10) of size 1000 with gradient descent. We take a reference net of width d = 27 = 128 trained with unscaled reference learning rates η a = η w = 0.02 and scale its hyperparameters according to MF (blue curves), NTK (orange curves), and intermediate scaling with qσ = 3/4 (green curves, see text). (from Figure 1 caption) and ...trained with (unscaled) reference learning rates η a = η w = 0.02 for GD and η a = η w = 0.0002 for RMSProp... (from Figure 2 caption).