Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Prevalence of Neural Collapse in Neural Multivariate Regression

Authors: George Andriopoulos, Zixuan Dong, Li Guo, Zifan Zhao, Keith Ross

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the n principal components of the feature vectors, where n is the dimension of the targets (for univariate regression, n = 1); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM)
Researcher Affiliation	Academia	George Andriopoulos1 Zixuan Dong2,4 Li Guo3 Zifan Zhao3 Keith Ross1 1 New York University Abu Dhabi 2 SFSC of AI and DL, NYU Shanghai 3 New York University Shanghai 4 New York University
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	We upload the code with environment in the supplemental materials.
Open Datasets	Yes	The Swimmer, Reacher, and Hopper datasets are based on Mo Jo Co [Todorov et al., 2012, Brockman et al., 2016, Towers et al., 2023], a physics engine that simulates diverse continuous multi-joint robot controls and has been a canonical benchmark for deep reinforcement learning research. In our experiments, we use publicly available expert datasets (see appendix A.1). ... The CARLA dataset originates from the CARLA Simulator, an open-source project designed to support the development of autonomous driving systems. We utilize a dataset Codevilla et al. [2018] ... The UTKFace dataset [Zhang et al., 2017] is widely used in computer vision to study age estimation from facial images of humans.
Dataset Splits	Yes	For each environment, we also take a subset of the full validation (test) dataset and keep the number of data 20% of training data size.
Hardware Specification	Yes	Compute resources Intel(R) Xeon(R) Platinum 8268 CPU (from Table 2); Compute resources NVIDIA A100 8358 80GB (from Table 3)
Software Dependencies	No	The paper mentions software tools like Mo Jo Co and ResNet, but does not specify version numbers for any libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	Table 2: Hyperparameter settings for experiments with weight decay on Mu Jo Co datasets. (e.g., Number of hidden layers 3, Batch size 256, Optimizer SGD, Learning rate 1e-2). Table 3: Hyperparameters of Res Net for Carla and UTKface datasets. (e.g., Epochs 100, Batch size 512, Optimizer SGD, Learning rate 0.001).