The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Authors: Michael Munn, Benoit Dherin, Javier Gonzalvo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting. |
| Researcher Affiliation | Industry | Michael Munn Google Research munn@google.com Benoit Dherin Google Research dherin@google.com Javier Gonzalvo Google Research xavigonzalvo@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper explicitly states in its NeurIPS Paper Checklist that code is not open source: "[No] Justification: Yes, while the code is not open source, the paper uses open datasets, which are well-known benchmark datasets for image processing, such as CIFAR-10, CIFAR100, MNIST, Fashion-MNIST and CIFAR-FS and mini-Image Net for transfer learning." |
| Open Datasets | Yes | standard one involves two stages. In a first stage, called pre-training, ones trains a deep neural network on a general, large-scale dataset in the form of a supervised or unsupervised source task; e.g., Image Net or CIFAR-100 [14, 33] for image models or the Common Crawl, C4 or LM1B datasets [8, 11, 49] for language models. |
| Dataset Splits | Yes | We trained a VGG-13 neural network on the full CIFAR-10 dataset with the provided architecture [56] and using the standard train/test split. Throughout training, we reported the following metrics measured and averaged over multiple batches of the training dataset: 1) the geometric complexity of the model embedding layer |
| Hardware Specification | Yes | Each sweep took roughly 10h of training on a single Google Cloud TPU V3 accessed via a Google colab. |
| Software Dependencies | No | A.4.3 mentions "We trained a Rest Net-18 neural network with width 1 implemented in Flax https://github. com/google/flax/blob/main/examples/imagenet/models.py". However, it does not specify version numbers for Flax or other software dependencies like JAX or TensorFlow, which are mentioned as used frameworks. |
| Experiment Setup | Yes | Top row: We swept over a learning rate range of {0.001, 0.0025, 0.005, 0.01, 0.025, 0.1} with a constant batch size of 512. Middle row: We swept over a batch size range of {8, 16, 32, 64, 128, 256} with a constant learning rate of 0.01. Bottom row: We swept over a L2 regularization rate range of {0.0, 0.00025, 0.0005, 0.001, 0.0025} with learning rate 0.01 and batch size 256. |