On the Limitations of Fractal Dimension as a Measure of Generalization
Authors: Charlie Tan, Inés García-Redondo, Qiquan Wang, Michael Bronstein, Anthea Monod
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our paper constitutes an extended empirical evaluation of the performance and viability of these proposed topological measures of generalization; in particular, robustness and failure modes are explored in a wider range of experiments than those considered by Birdal et al. [2021] and Dupuis et al. [2023]. |
| Researcher Affiliation | Academia | Charlie B. Tan University of Oxford Inés García-Redondo Imperial College London Qiquan Wang Imperial College London Michael M. Bronstein University of Oxford / Aithyra Anthea Monod Imperial College London |
| Pseudocode | No | The paper describes experimental procedures and refers to 'Algorithm 1 by Birdal et al. [2021]' but does not contain any pseudocode or clearly labeled algorithm blocks within its own content. |
| Open Source Code | Yes | Code provided for all experiments at: https://github.com/charliebtan/fractal_dimensions |
| Open Datasets | Yes | We employ the same datasets and architectures as Dupuis et al. [2023]: (i) fully-connected network of 5 (FCN-5) and 7 (FCN-7) layers on the California housing dataset (CHD) [Kelley Pace and Barry, 1997]; (ii) FCN-5 and FCN-7 on the MNIST dataset [Lecun et al., 1998]; and (iii) Alex Net [Krizhevsky et al., 2017] on the CIFAR-10 dataset [Krizhevsky, 2009]. |
| Dataset Splits | No | The paper describes convergence criteria based on empirical risk on the full training dataset and 100% training accuracy, but does not explicitly mention the use of a separate validation dataset or its split percentages for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | All experiments were run on high performance computing clusters using GPU nodes with Quadro RTX 6000 (128 CPU cores) or NVIDIA H100 (192 CPU cores). |
| Software Dependencies | No | The paper mentions reliance on 'the TDA software Giotto-TDA' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | Our experiments closely follow the setting of Dupuis et al. [2023]. We train with SGD until convergence, then continue for 5000 additional iterations... In keeping with the assumptions of this theory, we omit explicit regularization such as dropout or weight decay, and maintain constant learning rates in all experiments... For CHD, learning rates are logarithmically spaced between 0.001 and 0.01. Batch sizes take values {32, 65, 99, 132, 166, 200}. |