Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Authors: Arthur Jacot

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Experiment: Symmetry Learning and We observe in Figure 2 that a large depth L2-regularized network trained on this task learns the rotation symmetry of the task and learns two dimensional hidden representations that are summary statistics
Researcher Affiliation Academia Arthur Jacot Courant Institute of Mathematical Sciences New York University New York, NY 10012 arthur.jacot@nyu.edu
Pseudocode No No pseudocode or algorithm block was found in the paper.
Open Source Code No The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets No The training data is synthetic and designed to have a optimal rank k = 2. and For the first numerical experiment, the data pairs (x, y) were generated as follows. First we sample a 8-dimensional latent vector z, from which we define x = g(z1, . . . , z8) R20 and y = h(z1, z2) R20 for two random functions g : R8 R20 and h : R2 R20 given by two shallow networks with random parameters.
Dataset Splits No The paper mentions 'training data' and 'final train cost' but does not specify exact train/validation/test splits, sample counts for each split, or cross-validation methodology.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes A depth L = 25 network with a width of 200 trained on the task described in Section 5 with a ridge λ = 0.0002. and The training data is synthetic and designed to have a optimal rank k = 2. We see different ranges of depth where the network converges to different rank, with larger depths leading to smaller rank, until training fails and recover the zero parameters for L > 25. Within each range the norm θ 2 is well approximated by a affine function with slope equal to the rank.