reproducibilityindex.ai

A Geometric Analysis of Neural Collapse with Unconstrained Features

Authors: Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we run extensive experiments not only verifying our theoretical results on modern neural networks, but also demonstrating the potential practical beneﬁts of understanding NC. More speciﬁcally, while Theorem 3.2 holds true for the simpliﬁed unconstrained feature model, in Section 4.1 we run experiments on practical network architectures and show that our analysis of simpliﬁed models captures the gist of NC.
Researcher Affiliation	Collaboration	Zhihui Zhu University of Denver zhihui.zhu@du.edu Tianyu Ding Johns Hopkins University tding1@jhu.edu Jinxin Zhou University of Denver jinxin.zhou@du.edu Xiao Li University of Michigan xlxiao@umich.edu Chong You Google Research cyou@google.com Jeremias Sulam Johns Hopkins University jsulam1@jhu.edu Qing Qu University of Michigan qingqu@umich.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/tding1/Neural-Collapse.
Open Datasets	Yes	In Section 4.1 and Section 4.2, we train a Res Net18 architecture [18] on CIFAR10 [21] for image classiﬁcation using the cross-entropy loss (2). Due to limited space, we present all the results on MNIST [90] in the Appendix.
Dataset Splits	No	The paper mentions training on CIFAR10 and MNIST datasets, which have standard splits, but does not explicitly state the specific training, validation, and test dataset splits (e.g., percentages or sample counts) or mention the use of a validation set in its experimental setup description. It refers to 'training' and 'testing' accuracy.
Hardware Specification	Yes	All experiments are conducted using PyTorch 1.9.0 with CUDA 11.1 on NVIDIA Tesla A100 GPUs.
Software Dependencies	Yes	All experiments are conducted using PyTorch 1.9.0 with CUDA 11.1 on NVIDIA Tesla A100 GPUs.
Experiment Setup	Yes	We train the network for 200 epochs with three distinct optimizers: two ﬁrst-order methods (SGD and Adam) and one second-order method (LBFGS [69]). In particular, we use SGD with momentum 0.9, Adam with β1 = 0.9, β2 = 0.999, and LBFGS with a memory size of 10. The initial learning rates for SGD and Adam are set to 0.05 and 0.001, respectively, and decreased by a factor of 10 for every 40 epochs. For LBFGS, we use an initial learning rate of 0.1 and employ a strong Wolfe line-search strategy for subsequent iterations. Except otherwise speciﬁed, the weight decay is set to 5 10 4 for all the experiments.