The Prevalence of Neural Collapse in Neural Multivariate Regression

Authors: George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, Keith Ross

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the n principal components of the feature vectors, where n is the dimension of the targets (for univariate regression, n = 1); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM)
Researcher Affiliation Academia George Andriopoulos1 Zixuan Dong2,4 Li Guo3 Zifan Zhao3 Keith Ross1 1 New York University Abu Dhabi 2 SFSC of AI and DL, NYU Shanghai 3 New York University Shanghai 4 New York University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes We upload the code with environment in the supplemental materials.
Open Datasets Yes The Swimmer, Reacher, and Hopper datasets are based on Mo Jo Co [Todorov et al., 2012, Brockman et al., 2016, Towers et al., 2023], a physics engine that simulates diverse continuous multi-joint robot controls and has been a canonical benchmark for deep reinforcement learning research. In our experiments, we use publicly available expert datasets (see appendix A.1). ... The CARLA dataset originates from the CARLA Simulator, an open-source project designed to support the development of autonomous driving systems. We utilize a dataset Codevilla et al. [2018] ... The UTKFace dataset [Zhang et al., 2017] is widely used in computer vision to study age estimation from facial images of humans.
Dataset Splits Yes For each environment, we also take a subset of the full validation (test) dataset and keep the number of data 20% of training data size.
Hardware Specification Yes Compute resources Intel(R) Xeon(R) Platinum 8268 CPU (from Table 2); Compute resources NVIDIA A100 8358 80GB (from Table 3)
Software Dependencies No The paper mentions software tools like Mo Jo Co and ResNet, but does not specify version numbers for any libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes Table 2: Hyperparameter settings for experiments with weight decay on Mu Jo Co datasets. (e.g., Number of hidden layers 3, Batch size 256, Optimizer SGD, Learning rate 1e-2). Table 3: Hyperparameters of Res Net for Carla and UTKface datasets. (e.g., Epochs 100, Batch size 512, Optimizer SGD, Learning rate 0.001).