A Geometric Analysis of Neural Collapse with Unconstrained Features

Authors: Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we run extensive experiments not only verifying our theoretical results on modern neural networks, but also demonstrating the potential practical benefits of understanding NC. More specifically, while Theorem 3.2 holds true for the simplified unconstrained feature model, in Section 4.1 we run experiments on practical network architectures and show that our analysis of simplified models captures the gist of NC.
Researcher Affiliation Collaboration Zhihui Zhu University of Denver zhihui.zhu@du.edu Tianyu Ding Johns Hopkins University tding1@jhu.edu Jinxin Zhou University of Denver jinxin.zhou@du.edu Xiao Li University of Michigan xlxiao@umich.edu Chong You Google Research cyou@google.com Jeremias Sulam Johns Hopkins University jsulam1@jhu.edu Qing Qu University of Michigan qingqu@umich.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/tding1/Neural-Collapse.
Open Datasets Yes In Section 4.1 and Section 4.2, we train a Res Net18 architecture [18] on CIFAR10 [21] for image classification using the cross-entropy loss (2). Due to limited space, we present all the results on MNIST [90] in the Appendix.
Dataset Splits No The paper mentions training on CIFAR10 and MNIST datasets, which have standard splits, but does not explicitly state the specific training, validation, and test dataset splits (e.g., percentages or sample counts) or mention the use of a validation set in its experimental setup description. It refers to 'training' and 'testing' accuracy.
Hardware Specification Yes All experiments are conducted using PyTorch 1.9.0 with CUDA 11.1 on NVIDIA Tesla A100 GPUs.
Software Dependencies Yes All experiments are conducted using PyTorch 1.9.0 with CUDA 11.1 on NVIDIA Tesla A100 GPUs.
Experiment Setup Yes We train the network for 200 epochs with three distinct optimizers: two first-order methods (SGD and Adam) and one second-order method (LBFGS [69]). In particular, we use SGD with momentum 0.9, Adam with β1 = 0.9, β2 = 0.999, and LBFGS with a memory size of 10. The initial learning rates for SGD and Adam are set to 0.05 and 0.001, respectively, and decreased by a factor of 10 for every 40 epochs. For LBFGS, we use an initial learning rate of 0.1 and employ a strong Wolfe line-search strategy for subsequent iterations. Except otherwise specified, the weight decay is set to 5 10 4 for all the experiments.