What Can Be Learnt With Wide Convolutional Neural Networks?

Authors: Francesco Cagnetta, Alessandro Favero, Matthieu Wyart

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN with randomlyinitialised parameters. Interestingly, we find that, despite their hierarchical structure, the functions generated by infinitely-wide deep CNNs are too rich to be efficiently learnable in high dimension. ... We confirm these results through extensive numerical studies and find them to hold even if the nonoverlapping patches assumption is relaxed (Appendix G.4). ... Numerical experiments. We test our predictions by training a hierarchical kernel (student) on a random Gaussian function with zero mean and covariance given by another hierarchical kernel (teacher).
Researcher Affiliation Academia 1Institute of Physics, Ecole polytechnique f ed erale de Lausanne (EPFL), Lausanne, Switzerland 2Institute of Electrical Engineering, Ecole polytechnique f ed erale de Lausanne (EPFL), Lausanne, Switzerland.
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The repository containing all codes used to obtain the reported results can be found at https://github.com/pcsl-epfl/convolutional_neural_kernels.
Open Datasets Yes To illustrate this point, we trained a modified Le Net architecture with nonoverlapping patches and no pooling layers on CIFAR10, then compared the generalisation error with that of a standard Le Net architecture (Le Cun et al., 1998) trained with the same hyperparameters. ... Figure S2 shows the learning curves of the neural tangent kernels of different architectures applied to pairs of classes of the CIFAR-10 dataset.
Dataset Splits No The paper mentions n for training points and ntest for unseen examples, but does not specify a separate validation split. For example, in G.2: "We use n {128, 256, 512, 1024, 2048, 4096, 8192} and ntest = 8192." No mention of validation data.
Hardware Specification Yes Experiments were run on a high-performance computing cluster with nodes having Intel Xeon Gold processors with 20 cores and 192 GB of DDR4 RAM.
Software Dependencies Yes All codes are written in Py Torch (Paszke et al., 2019).
Experiment Setup Yes In order to obtain the learning curves, we generate n + ntest random points uniformly distributed on the product of hyperspheres over the patches. We use n {128, 256, 512, 1024, 2048, 4096, 8192} and ntest = 8192. For each value of n, we sample a Gaussian random field with zero mean and covariance given by the teacher kernel. ... The expectation over the teacher randomness is obtained by averaging over 16 independent sets of random input points and realisations of the Gaussian random fields. As teacher and student kernels, we use the analytical forms of the neural tangent kernels of hierarchical convolutional networks, with different combinations of depths and filter sizes.