A Geometric Analysis of Neural Collapse with Unconstrained Features
Authors: Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we run extensive experiments not only verifying our theoretical results on modern neural networks, but also demonstrating the potential practical benefits of understanding NC. More specifically, while Theorem 3.2 holds true for the simplified unconstrained feature model, in Section 4.1 we run experiments on practical network architectures and show that our analysis of simplified models captures the gist of NC. |
| Researcher Affiliation | Collaboration | Zhihui Zhu University of Denver zhihui.zhu@du.edu Tianyu Ding Johns Hopkins University tding1@jhu.edu Jinxin Zhou University of Denver jinxin.zhou@du.edu Xiao Li University of Michigan xlxiao@umich.edu Chong You Google Research cyou@google.com Jeremias Sulam Johns Hopkins University jsulam1@jhu.edu Qing Qu University of Michigan qingqu@umich.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/tding1/Neural-Collapse. |
| Open Datasets | Yes | In Section 4.1 and Section 4.2, we train a Res Net18 architecture [18] on CIFAR10 [21] for image classification using the cross-entropy loss (2). Due to limited space, we present all the results on MNIST [90] in the Appendix. |
| Dataset Splits | No | The paper mentions training on CIFAR10 and MNIST datasets, which have standard splits, but does not explicitly state the specific training, validation, and test dataset splits (e.g., percentages or sample counts) or mention the use of a validation set in its experimental setup description. It refers to 'training' and 'testing' accuracy. |
| Hardware Specification | Yes | All experiments are conducted using PyTorch 1.9.0 with CUDA 11.1 on NVIDIA Tesla A100 GPUs. |
| Software Dependencies | Yes | All experiments are conducted using PyTorch 1.9.0 with CUDA 11.1 on NVIDIA Tesla A100 GPUs. |
| Experiment Setup | Yes | We train the network for 200 epochs with three distinct optimizers: two first-order methods (SGD and Adam) and one second-order method (LBFGS [69]). In particular, we use SGD with momentum 0.9, Adam with β1 = 0.9, β2 = 0.999, and LBFGS with a memory size of 10. The initial learning rates for SGD and Adam are set to 0.05 and 0.001, respectively, and decreased by a factor of 10 for every 40 epochs. For LBFGS, we use an initial learning rate of 0.1 and employ a strong Wolfe line-search strategy for subsequent iterations. Except otherwise specified, the weight decay is set to 5 10 4 for all the experiments. |