Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

Authors: Umut Simsekli, Ozan Sener, George Deligiannidis, Murat A. Erdogdu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature. Experiments on both synthetic and real data verify that our bounds do not grow with the problem dimension, providing an accurate characterization of the generalization performance.
Researcher Affiliation Collaboration Umut Sim sekli1,2, Ozan Sener3, George Deligiannidis2,4, Murat A. Erdogdu5,6 LTCI, Télécom Paris, Institut Polytechnique de Paris1, University of Oxford2, Intel Labs3 The Alan Turing Institute4, University of Toronto5, Vector Institute6
Pseudocode No The paper describes algorithms and mathematical formulations but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes The code can be found in https://github.com/umutsimsekli/Hausdorff-Dimension-and-Generalization.
Open Datasets Yes We train models on the CIFAR-10 dataset [KH09]
Dataset Splits No The paper states it uses the CIFAR-10 dataset and discusses training and testing, but it does not explicitly provide details about specific training/validation/test splits (e.g., percentages or sample counts for each split).
Hardware Specification No The paper describes the experiments performed (e.g., training VGG networks on CIFAR-10) but does not provide any specific details about the hardware used, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using VGG networks and SGD but does not specify any particular software libraries, frameworks (e.g., PyTorch, TensorFlow), or their version numbers, that were used for implementation.
Experiment Setup Yes We vary the number of layers from D = 4 to D = 19, resulting in the number of parameters d between 1.3M and 20M. We train models on the CIFAR-10 dataset using SGD and we choose various stepsizes η, and batch sizes B. We trained all the models for 100 epochs