Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning

Authors: Yuxiao Wen, Arthur Jacot

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We define the CBN rank, which describes the number and type of frequencies that are kept inside the bottleneck, and partially prove that the parameter norm required to represent a function f scales as depth times the CBN rank f. We also show that the parameter norm depends at next order on the regularity of f. We show that any network with almost optimal parameter norm will exhibit a CBN structure in both the weights and under the assumption that the network is stable under large learning rate the activations, which motivates the common practice of downsampling; and we verify that the CBN results still hold with down-sampling. Finally we use the CBN structure to interpret the functions learned by CNNs on a number of tasks.
Researcher Affiliation Academia 1Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA. Correspondence to: Yuxiao Wen <yuxiaowen@nyu.edu>, Arthur Jacot <arthur.jacot@nyu.edu>.
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not provide any information or links regarding the availability of its source code.
Open Datasets Yes For our numerical experiments, we train networks on 4 different tasks, with different depths and ridge parameters. ... MNIST classification: For MNIST classification the CNN features a global pooling layer at the end, followed by a final fully-connected layer. ... MNIST digit 0 autoencoder: When training an autoencoder the networks keeps3 constant freq. along with 4 degree 1 freq. and 1 degree two freq. ... Autoencoder on synthetic data: We train an autoencoder on data obtained as the pixelwise multiplication of a low-freq shape with a high-freq repreating pattern (a single freq.(5, 5) Fourier function with random phase). ... Learning Newtonian Mechanics: We train a network to predict the trajectory of a ball: the inputs to the network are four frames of a ball under gravity (with different frames encoded in different channels) with a random initial position and velocity, from which the network has to predict the next 4 frames.
Dataset Splits No The paper does not provide specific details on training, validation, and test splits for the datasets used in the numerical experiments.
Hardware Specification No The paper does not specify any hardware used for the experiments (e.g., GPU models, CPU types).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For our numerical experiments, we train networks on 4 different tasks, with different depths and ridge parameters. We use filters with full size and cyclic boundaries. The pooling operator is Mβ = (1 − β)I + βA3, where A3 is the 3 × 3 average filter. We use a few different values of β. For the MNIST classification task, we also implement downsampling in the 2nd and 4th layers. ... Figure 1. We train a CNN (L = 11, cℓ = 60, λ = 0.005, β = 0.5) on MNIST. ... Figure 2. We train an autoencoder (L = 12, cℓ= 50, λ = 0.04, β = 1.0) on the 0-digits of MNIST downscaled to the size 13 × 13. ... Figure 3. CNN (L = 10, cℓ= 60, λ = 0.0005, β = 0.25) trained on images ... Figure 4. CNN (L = 9, cℓ= 60, λ = 0.0001, β = 0.25) learns to predict the trajectory of a ball