reproducibilityindex.ai

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning

Authors: Michael Munn, Benoit Dherin, Javier Gonzalvo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also inﬂuence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting.
Researcher Affiliation	Industry	Michael Munn Google Research munn@google.com Benoit Dherin Google Research dherin@google.com Javier Gonzalvo Google Research xavigonzalvo@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper explicitly states in its NeurIPS Paper Checklist that code is not open source: "[No] Justiﬁcation: Yes, while the code is not open source, the paper uses open datasets, which are well-known benchmark datasets for image processing, such as CIFAR-10, CIFAR100, MNIST, Fashion-MNIST and CIFAR-FS and mini-Image Net for transfer learning."
Open Datasets	Yes	standard one involves two stages. In a ﬁrst stage, called pre-training, ones trains a deep neural network on a general, large-scale dataset in the form of a supervised or unsupervised source task; e.g., Image Net or CIFAR-100 [14, 33] for image models or the Common Crawl, C4 or LM1B datasets [8, 11, 49] for language models.
Dataset Splits	Yes	We trained a VGG-13 neural network on the full CIFAR-10 dataset with the provided architecture [56] and using the standard train/test split. Throughout training, we reported the following metrics measured and averaged over multiple batches of the training dataset: 1) the geometric complexity of the model embedding layer
Hardware Specification	Yes	Each sweep took roughly 10h of training on a single Google Cloud TPU V3 accessed via a Google colab.
Software Dependencies	No	A.4.3 mentions "We trained a Rest Net-18 neural network with width 1 implemented in Flax https://github. com/google/flax/blob/main/examples/imagenet/models.py". However, it does not specify version numbers for Flax or other software dependencies like JAX or TensorFlow, which are mentioned as used frameworks.
Experiment Setup	Yes	Top row: We swept over a learning rate range of {0.001, 0.0025, 0.005, 0.01, 0.025, 0.1} with a constant batch size of 512. Middle row: We swept over a batch size range of {8, 16, 32, 64, 128, 256} with a constant learning rate of 0.01. Bottom row: We swept over a L2 regularization rate range of {0.0, 0.00025, 0.0005, 0.001, 0.0025} with learning rate 0.01 and batch size 256.