On the Limitations of Fractal Dimension as a Measure of Generalization

Authors: Charlie Tan, Inés García-Redondo, Qiquan Wang, Michael Bronstein, Anthea Monod

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our paper constitutes an extended empirical evaluation of the performance and viability of these proposed topological measures of generalization; in particular, robustness and failure modes are explored in a wider range of experiments than those considered by Birdal et al. [2021] and Dupuis et al. [2023].
Researcher Affiliation Academia Charlie B. Tan University of Oxford Inés García-Redondo Imperial College London Qiquan Wang Imperial College London Michael M. Bronstein University of Oxford / Aithyra Anthea Monod Imperial College London
Pseudocode No The paper describes experimental procedures and refers to 'Algorithm 1 by Birdal et al. [2021]' but does not contain any pseudocode or clearly labeled algorithm blocks within its own content.
Open Source Code Yes Code provided for all experiments at: https://github.com/charliebtan/fractal_dimensions
Open Datasets Yes We employ the same datasets and architectures as Dupuis et al. [2023]: (i) fully-connected network of 5 (FCN-5) and 7 (FCN-7) layers on the California housing dataset (CHD) [Kelley Pace and Barry, 1997]; (ii) FCN-5 and FCN-7 on the MNIST dataset [Lecun et al., 1998]; and (iii) Alex Net [Krizhevsky et al., 2017] on the CIFAR-10 dataset [Krizhevsky, 2009].
Dataset Splits No The paper describes convergence criteria based on empirical risk on the full training dataset and 100% training accuracy, but does not explicitly mention the use of a separate validation dataset or its split percentages for hyperparameter tuning or early stopping.
Hardware Specification Yes All experiments were run on high performance computing clusters using GPU nodes with Quadro RTX 6000 (128 CPU cores) or NVIDIA H100 (192 CPU cores).
Software Dependencies No The paper mentions reliance on 'the TDA software Giotto-TDA' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup Yes Our experiments closely follow the setting of Dupuis et al. [2023]. We train with SGD until convergence, then continue for 5000 additional iterations... In keeping with the assumptions of this theory, we omit explicit regularization such as dropout or weight decay, and maintain constant learning rates in all experiments... For CHD, learning rates are logarithmically spaced between 0.001 and 0.01. Batch sizes take values {32, 65, 99, 132, 166, 200}.