Generalized Neural Collapse for a Large Number of Classes
Authors: Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin G. Mixon, Chong You, Zhihui Zhu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical study to verify the prevalence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under an unconstrained feature model with spherical constraint, subject to specific technical conditions on feature dimension and the number of classes. Empirically, we verify that the GNC approximately holds in practical DNNs trained with a small temperature in CE loss. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, The Ohio State University, Columbus, OH, USA 2Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, MI, USA 3Department of Mathematics, The Ohio State University, Columbus, OH, USA 4Google Research, New York City, NY, USA. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes procedures using mathematical formulations and textual explanations. |
| Open Source Code | No | The paper does not include an unambiguous statement by the authors that they are releasing the source code for the methodology described in this paper, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We verify the occurence of GNC by training a Res Net18 on the CIFAR100 dataset (Krizhevsky, 2009), and report the results in Figure 2. We train a Res Net18 network on four classes {Automobile, Cat, Dog, Truck} from CIFAR10 dataset. We train Res Net18, Dense Net121, and Res Ne Xt50 network on the CIFAR100, Tiny-Image Net and BUPT-CBFace-50 datasets using CE loss. |
| Dataset Splits | No | The paper describes the use of datasets for training and testing, and mentions 'in-distribution (ID) task' and 'out-of-distribution (OOD) task' for fine-tuning, but does not provide specific train/validation/test dataset split percentages, absolute sample counts for each split, or references to predefined splits for reproducibility. |
| Hardware Specification | Yes | Additionally, all experiments were conducted on 4x V100 GPU with 32G memory. |
| Software Dependencies | No | The paper mentions optimizers (SGD, Adam) and schedulers (Cosine Annealing) but does not provide specific software names with version numbers for reproducibility (e.g., 'PyTorch 1.x' or 'TensorFlow 2.x'). |
| Experiment Setup | Yes | For optimization, we utilized SGD with a momentum of 0.9 and an initial learning rate of 0.1, which decayed according to the Cosine Annealing over a span of 200 epochs. We set the feature dimension to d = 20 and the temperature parameter to τ = 0.1. We employed the Adam optimizer with a learning rate of 1e 5 and utilized the Cosine Annealing scheduler. The models are fine-tuned for 5 epochs with the batch size of 100. |