Extended Unconstrained Features Model for Exploring Deep Neural Collapse
Authors: Tom Tirer, Joan Bruna
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate the usefulness of our nonlinear extended UFM in modeling the NC phenomenon that occurs with practical networks. |
| Researcher Affiliation | Academia | 1Center for Data Science, New York University, New York 2Courant Institute of Mathematical Sciences, New York University, New York. Correspondence to: Tom Tirer <tirer.tom@gmail.com>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Finally, we show the similarity of the NC metrics that are obtained for the nonlinear extended UFM in Figure 4 (rather than those in Figure 3) and metrics obtained by a practical well-trained DNN, namely Res Net18 (He et al., 2016) (composed of 4 Res Blocks), trained on MNIST with SGD with learning rate 0.05 (divided by 10 every 40 epochs) and weight decay (L2 regularization) of 5e-4. Figure 5 shows the results for two cases: 1) MSE loss without bias in the FC layer; and 2) the widely-used setting, with cross-entropy loss and bias. (Additional experiments with CIFAR10 dataset appear in Appendix G). |
| Dataset Splits | No | The paper mentions training on MNIST and CIFAR10 but does not specify the train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions Res Net18 but does not provide specific software dependencies with version numbers (e.g., library names, framework versions). |
| Experiment Setup | Yes | Figure 1 corroborates Theorem 3.1 for K = 4, d = 20, n = 50 and λW = λH = 0.005 (no bias is used, equivalently λb ). Both W and H are initialized with standard normal distribution and are optimized with plain gradient descent with step-size 0.1. ... trained on MNIST with SGD with learning rate 0.05 (divided by 10 every 40 epochs) and weight decay (L2 regularization) of 5e-4. |