reproducibilityindex.ai

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Authors: Arthur Jacot

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Numerical Experiment: Symmetry Learning and We observe in Figure 2 that a large depth L2-regularized network trained on this task learns the rotation symmetry of the task and learns two dimensional hidden representations that are summary statistics
Researcher Affiliation	Academia	Arthur Jacot Courant Institute of Mathematical Sciences New York University New York, NY 10012 arthur.jacot@nyu.edu
Pseudocode	No	No pseudocode or algorithm block was found in the paper.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets	No	The training data is synthetic and designed to have a optimal rank k = 2. and For the first numerical experiment, the data pairs (x, y) were generated as follows. First we sample a 8-dimensional latent vector z, from which we define x = g(z1, . . . , z8) R20 and y = h(z1, z2) R20 for two random functions g : R8 R20 and h : R2 R20 given by two shallow networks with random parameters.
Dataset Splits	No	The paper mentions 'training data' and 'final train cost' but does not specify exact train/validation/test splits, sample counts for each split, or cross-validation methodology.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	A depth L = 25 network with a width of 200 trained on the task described in Section 5 with a ridge λ = 0.0002. and The training data is synthetic and designed to have a optimal rank k = 2. We see different ranges of depth where the network converges to different rank, with larger depths leading to smaller rank, until training fails and recover the zero parameters for L > 25. Within each range the norm θ 2 is well approximated by a affine function with slope equal to the rank.