Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms
Authors: Rayna Andreeva, Benjamin Dupuis, Rik Sarkar, Tolga Birdal, Umut Simsekli
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers. |
| Researcher Affiliation | Academia | Rayna Andreeva 1, Benjamin Dupuis 2,3, Rik Sarkar1, Tolga Birdal 4, Umut Sim sekli 2,3 1 School of Informatics, University of Edinburgh, UK 2 INRIA, France 3 CNRS, Ecole Normale Supérieure PSL Research University, France 4 Department of Computing, Imperial College London, UK |
| Pseudocode | No | The paper describes algorithms and methods (e.g., 'simple yet effective algorithms for computing generalization indices', 'Krylov approximation method') but does not provide structured pseudocode blocks or formally labeled 'Algorithm X' sections. |
| Open Source Code | Yes | We will make our entire implementation publicly available under: https://github.com/rorondre/TDAGeneralization. |
| Open Datasets | Yes | We consider several vision transformers [20] and graph neural networks (GNN) [30] trained on multiple datasets spanning regular and irregular data domains (cf. Figure 1c). ...CIFAR10 [44] and CIFAR100 [43] datasets. ...Super-pixel MNIST dataset [23]. |
| Dataset Splits | No | The paper describes varying hyperparameters (learning rate, batch size) on a grid for training and evaluating test risk, but does not explicitly mention or use a separate 'validation' dataset or split for model selection or early stopping. |
| Hardware Specification | Yes | We ran the experiments on 18 NVIDIA 2080Ti (11 GB) GPUs. ...All experiments were ran on 18 Intel Xeon Silver 4114 CPUs. |
| Software Dependencies | No | The paper mentions software like 'giotto-ph', 'krypy.linsys.Cg', 'scikit-learn', 'timm library', and 'dgl library' but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | By varying the learning rate (η) and the batch size (b), we define a grid of 6 × 6 hyperparameters. For each pair (η, b), we compute the training trajectory Wt0 T for 5 × 10^3 iterations. Unless specified, we use the ADAM optimizer [40]. ...For all experiments, we used 0.1 proportion of the training data for the computation of the pseudo matrix, apart from Cai T and Swin on CIFAR100, where we used 0.09 proportion of the training data due to memory constraints. |