reproducibilityindex.ai

Neural Networks as Kernel Learners: The Silent Alignment Effect

Authors: Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that such an effect takes place in homogenous neural networks with small initialization and whitened data. We provide an analytical treatment of this effect in the fully connected linear network case. In general, we ﬁnd that the kernel develops a low-rank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the ﬁnal network s tangent kernel. The early spectral learning of the kernel depends on the depth. We also demonstrate that non-whitened data can weaken the silent alignment effect.
Researcher Affiliation	Academia	Alexander Atanasov , Blake Bordelon & Cengiz Pehlevan Harvard University Cambridge, MA 02138, USA {atanasov,blake bordelon,cpehlevan}@g.harvard.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We trained a 2-layer Re LU MLP on P = 1000 MNIST images of handwritten 0 s and 1 s which were whitened. Early in training, around t 50, the NTK aligns to the target function and stay ﬁxed (green). The kernel s overall scale (orange) and the loss (blue) begin to move at around t = 300. The analytic solution for the maximal ﬁnal alignment value in linear networks is overlayed (dashed green), see Appendix E.2. (b) We compare the predictions of the NTK and the trained network on MNIST test points. Due to silent alignment, the ﬁnal learned function is well described as a kernel regression solution with the ﬁnal NTK K . However, regression with the initial NTK is not a good model of the network s predictions. (c) The same experiment on P = 1000 whitened CIFAR-10 images from the ﬁrst two classes. Here we use MSE loss on a width 100 network with initialization scale σ = 0.1.
Dataset Splits	No	The paper provides specific training and test set details, but does not explicitly mention or detail a separate validation dataset split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software like the Adam optimizer and the Neural Tangents API, but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We trained a 2-layer Re LU MLP on P = 1000 MNIST images of handwritten 0 s and 1 s which were whitened. Early in training, around t 50, the NTK aligns to the target function and stay ﬁxed (green). The kernel s overall scale (orange) and the loss (blue) begin to move at around t = 300. The analytic solution for the maximal ﬁnal alignment value in linear networks is overlayed (dashed green), see Appendix E.2. (b) We compare the predictions of the NTK and the trained network on MNIST test points. Due to silent alignment, the ﬁnal learned function is well described as a kernel regression solution with the ﬁnal NTK K . However, regression with the initial NTK is not a good model of the network s predictions. (c) The same experiment on P = 1000 whitened CIFAR-10 images from the ﬁrst two classes. Here we use MSE loss on a width 100 network with initialization scale σ = 0.1.