A two-scale Complexity Measure for Deep Learning Models
Authors: Massimiliano Datres, Gian Leonardi, Alessio Figalli, David Sutter
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental evidence that the behavior of the loss in the training of given parametric models is related both with 2s ED (4) and the lower 2s ED (10). We compute dζ and dζ of different feed-forward neural networks (FNN) such as convolutional neural networks (CNN) and multi-layer perceptrons (MLP). The experiments rely on the computation of the exact eigenvalues via the numpy.linalg.eig function. As discussed in Section 8, an interesting research direction (which goes beyond the goal of this paper) would be to investigate strategies to efficiently compute the (lower) 2s ED via a suitable approximation of the spectrum of F. The feed-forward neural network choice is justified by their architecture characterized by a Markovian dependency structure. Indeed, the flow of information in FNN is unidirectional from input to output, making them representable with a finite acyclic graph. We evaluate dζ and dζ on real-world datasets, including Covertype dataset [6], MNIST dataset [11], and CIFAR10 [18]. |
| Researcher Affiliation | Collaboration | Massimiliano Datres1,2, Gian Paolo Leonardi1, Alessio Figalli3, David Sutter4 1 Department of Mathematics, University of Trento, Trento 2 DSH, Bruno Kessler Fondation, Trento 3 Department of Mathematics, ETH, Zurich 4 IBM Quantum, IBM Research Europe, Zurich |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm", nor does it present structured algorithm blocks. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code...? Answer: [No] , Justification: The main contribution of this paper is a theoretical result on complexity measures for deep learning models. Datasets are public and referenced in the experiments section, while the code can be easily reproduced using definitions and the experimental setting described in the main body of this work. |
| Open Datasets | Yes | We evaluate dζ and dζ on real-world datasets, including Covertype dataset [6], MNIST dataset [11], and CIFAR10 [18]. |
| Dataset Splits | No | The paper mentions using 100 samples and 100 vectors for Monte Carlo estimation, and training with 10000 or 100000 data points. However, it does not specify explicit train/validation splits or percentages for overall model training and evaluation, only refers to training data and samples for specific estimations. |
| Hardware Specification | Yes | All simulations are conducted on a 12th Gen Intel(R) Core(TM) i9-12900KF equipped by a NVIDIA Ge Force RTX 4090. |
| Software Dependencies | No | The paper mentions using "numpy.linalg.eig function" and "Adam optimizer" but does not specify version numbers for these software components or any other libraries/packages. |
| Experiment Setup | Yes | Training loss plots of MLPs on 10000 random Covertype samples using Adam with learning rate 1e 3 and a batch size 64; ... Training loss plots of CNNs on CIFAR10 using Adam optimizer with learning rate 1e 3 and a batch size 512. |