Neural collapse vs. low-rank bias: Is deep neural collapse really optimal?
Authors: Peter Súkeník, Christoph H. Lampert, Marco Mondelli
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent. ... 6 Numerical results |
| Researcher Affiliation | Academia | Peter Súkeník Institute of Science and Technology Austria 3400 Klosterneuburg, Austria peter.sukenik@ista.ac.at Christoph Lampert Institute of Science and Technology Austria 3400 Klosterneuburg, Austria chl@ista.ac.at Marco Mondelli Institute of Science and Technology Austria 3400 Klosterneuburg, Austria marco.mondelli@ista.ac.at |
| Pseudocode | No | The paper describes mathematical constructions and procedures in prose, such as Definition 4 detailing the 'strongly regular graph (SRG) solution', but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The source code will be provided on request. |
| Open Datasets | Yes | We support our theoretical results with empirical findings in three regimes: ... training on standard datasets (MNIST [31], CIFAR-10 [28]) with DUFM-like regularization... |
| Dataset Splits | No | The paper states it uses standard datasets (MNIST, CIFAR-10) which have predefined splits, but it does not explicitly provide the training, validation, and test dataset splits or percentages used for the experiments. The NeurIPS checklist also indicates that not all experimental details are fully provided. |
| Hardware Specification | No | The NeurIPS Paper Checklist explicitly states that the paper does not provide sufficient information on computer resources: 'The experiments do not require any specific hardware setup.' |
| Software Dependencies | No | The paper mentions the use of general frameworks and models such as 'Res Net20' and 'MLP head', but it does not provide specific version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | In the top row of Figure 2, we consider a 4-DUFM, with K = 10 and n = 50, presenting the training progression... λ = 0.004 for all regularization parameters, learning rate of 0.5 and width 30. ... We use weight decay 0.005 except λH1 = 0.000005 (to compensate for n = 5000, which significantly influences the total regularization strength), learning rate 0.05 and width 64 for all the MLP layers. |