Neural collapse vs. low-rank bias: Is deep neural collapse really optimal?

Authors: Peter Súkeník, Christoph H. Lampert, Marco Mondelli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent. ... 6 Numerical results
Researcher Affiliation Academia Peter Súkeník Institute of Science and Technology Austria 3400 Klosterneuburg, Austria peter.sukenik@ista.ac.at Christoph Lampert Institute of Science and Technology Austria 3400 Klosterneuburg, Austria chl@ista.ac.at Marco Mondelli Institute of Science and Technology Austria 3400 Klosterneuburg, Austria marco.mondelli@ista.ac.at
Pseudocode No The paper describes mathematical constructions and procedures in prose, such as Definition 4 detailing the 'strongly regular graph (SRG) solution', but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The source code will be provided on request.
Open Datasets Yes We support our theoretical results with empirical findings in three regimes: ... training on standard datasets (MNIST [31], CIFAR-10 [28]) with DUFM-like regularization...
Dataset Splits No The paper states it uses standard datasets (MNIST, CIFAR-10) which have predefined splits, but it does not explicitly provide the training, validation, and test dataset splits or percentages used for the experiments. The NeurIPS checklist also indicates that not all experimental details are fully provided.
Hardware Specification No The NeurIPS Paper Checklist explicitly states that the paper does not provide sufficient information on computer resources: 'The experiments do not require any specific hardware setup.'
Software Dependencies No The paper mentions the use of general frameworks and models such as 'Res Net20' and 'MLP head', but it does not provide specific version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes In the top row of Figure 2, we consider a 4-DUFM, with K = 10 and n = 50, presenting the training progression... λ = 0.004 for all regularization parameters, learning rate of 0.5 and width 30. ... We use weight decay 0.005 except λH1 = 0.000005 (to compensate for n = 5000, which significantly influences the total regularization strength), learning rate 0.05 and width 64 for all the MLP layers.