Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes

Authors: Ouns El Harzli, Bernardo Cuenca Grau, Guillermo Valle-Pérez, Ard A. Louis

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical evidence demonstrating the accuracy of our predictions of the spectral distribution of NNGP kernel random feature matrices, as well as the manifestation of the double-descent phenomenon in the generalisation error of NNGP kernel regression. These experiments were carried out using GPU resources on Google Colab. We have simulated the empirical spectral distribution of the kernel random matrix KN X,X for high values of N, n, d for Re LU and tanh with both a synthetic dataset with data drawn from an isotropic multivariate Gaussian distribution Pd = N(0, 1/d Id), and the MNIST dataset (Le Cun 2012). As illustrated in Figure 1, we find excellent agreement with the theoretical prediction of the limiting spectral distributions. We used the Marchenko-Pastur fixed point equation (1) to compute the limiting spectral distribution ργ MP µϕ ψ, by iterating over the recursive sequence it defines in the Stieltjes transform space and then inverting the Stieljes transform using the inversion formula. In the case of synthetic data drawn from Pd = N(0, 1/d Id) and with no nonlinearity, the actual NNGP kernel matrix can be characterised exactly by µϕ ψ = ρψ MP . In general, the actual NNGP kernel is not known, hence we estimated the actual NNGP kernel matrix by sampling K ˆ N X,X with a very large value of ˆN n,relying on the fact that ρ0 MP µϕ ψ = µϕ ψ. We focused on a subset of MNIST restricted to digits 0 and 1 in order to simplify the structure of the covariance matrices and their spectral distributions.
Researcher Affiliation Academia 1Department of Computer Science, University of Oxford 2Rudolf Peierls Centre for Theoretical Physics, University of Oxford
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the methodology described.
Open Datasets Yes We have simulated the empirical spectral distribution of the kernel random matrix KN X,X for high values of N, n, d for Re LU and tanh with both a synthetic dataset with data drawn from an isotropic multivariate Gaussian distribution Pd = N(0, 1/d Id), and the MNIST dataset (Le Cun 2012).
Dataset Splits No The paper does not specify training, validation, and test splits with percentages or counts. It only mentions using a "small subset (300 examples)" for estimation but no explicit data partitioning details.
Hardware Specification No The paper states "These experiments were carried out using GPU resources on Google Colab," which is too general. It does not provide specific GPU models, CPU details, or other precise hardware specifications.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes Top, we use a two-layer NNGP without non-linearities with teacher distribution N(0, 1/d Id) using N = 300, n = 200, and d = 400. Bottom, we use a two-layer Re LU NNGP on a subset of MNIST taking N = 600, n = 300, and d = 784 (pixels on MNIST images).