Generalized Neural Collapse for a Large Number of Classes

Authors: Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin G. Mixon, Chong You, Zhihui Zhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical study to verify the prevalence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under an unconstrained feature model with spherical constraint, subject to specific technical conditions on feature dimension and the number of classes. Empirically, we verify that the GNC approximately holds in practical DNNs trained with a small temperature in CE loss.
Researcher Affiliation Collaboration 1Department of Computer Science, The Ohio State University, Columbus, OH, USA 2Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, MI, USA 3Department of Mathematics, The Ohio State University, Columbus, OH, USA 4Google Research, New York City, NY, USA.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes procedures using mathematical formulations and textual explanations.
Open Source Code No The paper does not include an unambiguous statement by the authors that they are releasing the source code for the methodology described in this paper, nor does it provide a direct link to a code repository.
Open Datasets Yes We verify the occurence of GNC by training a Res Net18 on the CIFAR100 dataset (Krizhevsky, 2009), and report the results in Figure 2. We train a Res Net18 network on four classes {Automobile, Cat, Dog, Truck} from CIFAR10 dataset. We train Res Net18, Dense Net121, and Res Ne Xt50 network on the CIFAR100, Tiny-Image Net and BUPT-CBFace-50 datasets using CE loss.
Dataset Splits No The paper describes the use of datasets for training and testing, and mentions 'in-distribution (ID) task' and 'out-of-distribution (OOD) task' for fine-tuning, but does not provide specific train/validation/test dataset split percentages, absolute sample counts for each split, or references to predefined splits for reproducibility.
Hardware Specification Yes Additionally, all experiments were conducted on 4x V100 GPU with 32G memory.
Software Dependencies No The paper mentions optimizers (SGD, Adam) and schedulers (Cosine Annealing) but does not provide specific software names with version numbers for reproducibility (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x').
Experiment Setup Yes For optimization, we utilized SGD with a momentum of 0.9 and an initial learning rate of 0.1, which decayed according to the Cosine Annealing over a span of 200 epochs. We set the feature dimension to d = 20 and the temperature parameter to τ = 0.1. We employed the Adam optimizer with a learning rate of 1e 5 and utilized the Cosine Annealing scheduler. The models are fine-tuned for 5 epochs with the batch size of 100.