The Prevalence of Neural Collapse in Neural Multivariate Regression
Authors: George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, Keith Ross
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the n principal components of the feature vectors, where n is the dimension of the targets (for univariate regression, n = 1); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM) |
| Researcher Affiliation | Academia | George Andriopoulos1 Zixuan Dong2,4 Li Guo3 Zifan Zhao3 Keith Ross1 1 New York University Abu Dhabi 2 SFSC of AI and DL, NYU Shanghai 3 New York University Shanghai 4 New York University |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We upload the code with environment in the supplemental materials. |
| Open Datasets | Yes | The Swimmer, Reacher, and Hopper datasets are based on Mo Jo Co [Todorov et al., 2012, Brockman et al., 2016, Towers et al., 2023], a physics engine that simulates diverse continuous multi-joint robot controls and has been a canonical benchmark for deep reinforcement learning research. In our experiments, we use publicly available expert datasets (see appendix A.1). ... The CARLA dataset originates from the CARLA Simulator, an open-source project designed to support the development of autonomous driving systems. We utilize a dataset Codevilla et al. [2018] ... The UTKFace dataset [Zhang et al., 2017] is widely used in computer vision to study age estimation from facial images of humans. |
| Dataset Splits | Yes | For each environment, we also take a subset of the full validation (test) dataset and keep the number of data 20% of training data size. |
| Hardware Specification | Yes | Compute resources Intel(R) Xeon(R) Platinum 8268 CPU (from Table 2); Compute resources NVIDIA A100 8358 80GB (from Table 3) |
| Software Dependencies | No | The paper mentions software tools like Mo Jo Co and ResNet, but does not specify version numbers for any libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | Table 2: Hyperparameter settings for experiments with weight decay on Mu Jo Co datasets. (e.g., Number of hidden layers 3, Batch size 256, Optimizer SGD, Learning rate 1e-2). Table 3: Hyperparameters of Res Net for Carla and UTKface datasets. (e.g., Epochs 100, Batch size 512, Optimizer SGD, Learning rate 0.001). |