reproducibilityindex.ai

Spectral Evolution and Invariance in Linear-width Neural Networks

Authors: Zhichao Wang, Andrew Engel, Anand D Sarwate, Ioana Dumitriu, Tony Chiang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invariance of the bulk spectra for both conjugate and neural tangent kernels. We demonstrate similar characteristics when training with stochastic gradient descent with small learning rates. We exhibit different spectral properties such as invariant bulk, spike, and heavy-tailed distribution from a two-layer neural network using different training strategies, and then correlate them to the feature learning. Analogous phenomena also appear when we train conventional neural networks with real-world data.
Researcher Affiliation	Collaboration	Zhichao Wang University of California San Diego zhw036@ucsd.edu Andrew Engel Pacific Northwest National Laboratory andrew.engel@pnnl.gov Anand Sarwate Rutgers, The State University of New Jersey ads221@soe.rutgers.edu Ioana Dumitriu University of California San Diego idumitriu@ucsd.edu Tony Chiang Pacific Northwest National Laboratory University of Washington University of Texas at El Paso tony.chiang@pnnl.gov
Pseudocode	No	No pseudocode or algorithm blocks are explicitly presented.
Open Source Code	No	The paper does not provide a direct statement about releasing its own source code for the described methodology or a link to a repository.
Open Datasets	Yes	First, we show the spectra of KNTK before and after training for binary classification on CIFAR-2 through small CNNs in Figure 4. We also investigate the spectral properties on the pre-trained model, BERT from [26], with fine-tuning on Sentiment140 dataset of tweets5 from [36].
Dataset Splits	No	The paper mentions training and testing data but does not provide explicit percentages, counts, or specific methodology for dataset splits.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments.
Software Dependencies	No	The paper mentions software like BERT, Adam, SGD, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Table 1: Four models with the same architecture (n = 2000, h = 1500, d = 1000, and σ is normalized tanh), but different choices of initial learning rates and optimizers listed in Table 1. The training label noise σε = 0.3 and the teacher model is defined by (9) with σ a normalized softplus and τ = 0.2. For fine-tuning, the learning rate is 0.003, the batch size is 64 and the momentum is 0.8.