Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff
Authors: Arthur Jacot
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiment: Symmetry Learning and We observe in Figure 2 that a large depth L2-regularized network trained on this task learns the rotation symmetry of the task and learns two dimensional hidden representations that are summary statistics |
| Researcher Affiliation | Academia | Arthur Jacot Courant Institute of Mathematical Sciences New York University New York, NY 10012 arthur.jacot@nyu.edu |
| Pseudocode | No | No pseudocode or algorithm block was found in the paper. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The training data is synthetic and designed to have a optimal rank k = 2. and For the first numerical experiment, the data pairs (x, y) were generated as follows. First we sample a 8-dimensional latent vector z, from which we define x = g(z1, . . . , z8) R20 and y = h(z1, z2) R20 for two random functions g : R8 R20 and h : R2 R20 given by two shallow networks with random parameters. |
| Dataset Splits | No | The paper mentions 'training data' and 'final train cost' but does not specify exact train/validation/test splits, sample counts for each split, or cross-validation methodology. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | A depth L = 25 network with a width of 200 trained on the task described in Section 5 with a ridge λ = 0.0002. and The training data is synthetic and designed to have a optimal rank k = 2. We see different ranges of depth where the network converges to different rank, with larger depths leading to smaller rank, until training fails and recover the zero parameters for L > 25. Within each range the norm θ 2 is well approximated by a affine function with slope equal to the rank. |