Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap

Authors: Weiyang Liu, Longhui Yu, Adrian Weller, Bernhard Schölkopf

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that HUG works well in terms of generalization and robustness. Our experiments aims to demonstrate the empirical effectiveness of HUG, so we focus on the fair comparison to the popular CE loss under the same setting.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems Tübingen 2University of Cambridge 3Peking University 4The Alan Turing Institute
Pseudocode No No explicit pseudocode or algorithm block was found in the paper.
Open Source Code No The paper does not provide a direct link to open-source code or explicitly state that the code for their method is released.
Open Datasets Yes Specifically, we train a convolutional neural network (CNN) on MNIST with feature dimension 2. To see whether the same conclusion holds for higher feature dimensions, we also train two CNNs on CIFAR-100 with feature dimension as 64 and 128, respectively. We find that the GNC hypothesis remains valid and informative even under the scenario of large number of classes (we use the 1000-class Image Net-2012 dataset [13] here).
Dataset Splits Yes We split both the CIFAR-10 and CIFAR-100 training set into 5 tasks. We follow LDAM [6] to obtain imbalanced CIFAR-10 and CIFAR-100 datasets with different imbalanced ratio.
Hardware Specification No The paper does not provide specific details on the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as PyTorch or TensorFlow versions.
Experiment Setup Yes For MHE-HUG and MHS-HUG, α and β are set as 0.15 and 0.015, respectively. For MGD-HUG, α and β are set as 0.15 and 0.03, respectively. We train the model for 200 epochs with 512 batchsize for both the cross-entropy (CE) loss and HUG. We use the stochastic gradient descent with momentum 0.9 and weight decay 2e-4. The initial learning rate is set as 0.1 for both CIFAR-100 and CIFAR-10 and is divided by 10 at 60, 120, 180 epoch.