Intrinsic dimension of data representations in deep neural networks
Authors: Alessio Ansuini, Alessandro Laio, Jakob H. Macke, Davide Zoccolan
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we study the intrinsic dimensionality (ID) of datarepresentations, i.e. the minimal number of parameters needed to describe a representation. We find that, in a trained network, the ID is orders of magnitude smaller than the number of units in each layer. Across layers, the ID first increases and then progressively decreases in the final layers. Remarkably, the ID of the last hidden layer predicts classification accuracy on the test set. These results can neither be found by linear dimensionality estimates (e.g., with principal component analysis), nor in representations that had been artificially linearized. They are neither found in untrained networks, nor in networks that are trained on randomized labels. This suggests that neural networks that can generalize are those that transform the data into low-dimensional, but not necessarily flat manifolds. |
| Researcher Affiliation | Academia | Alessio Ansuini International School for Advanced Studies alessioansuini@gmail.com Alessandro Laio International School for Advanced Studies laio@sissa.it Jakob H. Macke Technical University of Munich macke@tum.de Davide Zoccolan International School for Advanced Studies zoccolan@sissa.it |
| Pseudocode | Yes | Figure 1: The Two NN estimator derives an estimate of intrinsic dimensionality from the statistics of nearestneighbour distances. 1) For each data point i compute the distance to its first and second neighbour (ri,1 and ri,2) 2) For each i compute 𝜇i = ri,2/ri,1. ... 3) Infer d from the empirical probability distribution of all the mi. 4) Repeat the calculation selecting a fraction of points at random. This gives the ID as a function of the scale. |
| Open Source Code | Yes | The code to compute the ID estimates with the Two NN method and to reproduce our experiments is available at this repository. |
| Open Datasets | Yes | We first investigated the variation of the ID across the layers of a VGG-16 network (20), pre-trained on Image Net (11), and fine-tuned and evaluated on a synthetic data-set of 1440 images (21). ...computed the average ID of the object manifolds corresponding to the 7 biggest Image Net categories, using 500 images per category... we generated a modified MNIST dataset (referred to as MNIST )...(38) Y. Le Cun and C. Cortes, MNIST handwritten digit database, 2010. |
| Dataset Splits | No | The paper mentions leaving out a 'test set' for the synthetic dataset and discusses performance 'without estimating the performance on an external validation set', indicating that explicit training/validation splits are not provided or used in the conventional sense for hyperparameter tuning. |
| Hardware Specification | No | The paper mentions performing calculations 'In a few seconds on a desktop PC' but does not provide specific hardware details such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper cites the PyTorch framework (37) but does not explicitly state the specific version numbers of any software libraries or dependencies used in their experiments. |
| Experiment Setup | Yes | We extracted representations at pooling layers after a convolution or a block of consecutive convolutions, and at fully connected layers. In the experiments with Res Nets, we extracted the representations after each Res Net block (19) and the average pooling before the output. ... we generated a modified MNIST dataset (referred to as MNIST ) by adding a luminance perturbation... with λ = 100... |