reproducibilityindex.ai

Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

Authors: Rayna Andreeva, Benjamin Dupuis, Rik Sarkar, Tolga Birdal, Umut Simsekli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers.
Researcher Affiliation	Academia	Rayna Andreeva 1, Benjamin Dupuis 2,3, Rik Sarkar1, Tolga Birdal 4, Umut Sim sekli 2,3 1 School of Informatics, University of Edinburgh, UK 2 INRIA, France 3 CNRS, Ecole Normale Supérieure PSL Research University, France 4 Department of Computing, Imperial College London, UK
Pseudocode	No	The paper describes algorithms and methods (e.g., 'simple yet effective algorithms for computing generalization indices', 'Krylov approximation method') but does not provide structured pseudocode blocks or formally labeled 'Algorithm X' sections.
Open Source Code	Yes	We will make our entire implementation publicly available under: https://github.com/rorondre/TDAGeneralization.
Open Datasets	Yes	We consider several vision transformers [20] and graph neural networks (GNN) [30] trained on multiple datasets spanning regular and irregular data domains (cf. Figure 1c). ...CIFAR10 [44] and CIFAR100 [43] datasets. ...Super-pixel MNIST dataset [23].
Dataset Splits	No	The paper describes varying hyperparameters (learning rate, batch size) on a grid for training and evaluating test risk, but does not explicitly mention or use a separate 'validation' dataset or split for model selection or early stopping.
Hardware Specification	Yes	We ran the experiments on 18 NVIDIA 2080Ti (11 GB) GPUs. ...All experiments were ran on 18 Intel Xeon Silver 4114 CPUs.
Software Dependencies	No	The paper mentions software like 'giotto-ph', 'krypy.linsys.Cg', 'scikit-learn', 'timm library', and 'dgl library' but does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	By varying the learning rate (η) and the batch size (b), we define a grid of 6 × 6 hyperparameters. For each pair (η, b), we compute the training trajectory Wt0 T for 5 × 10^3 iterations. Unless specified, we use the ADAM optimizer [40]. ...For all experiments, we used 0.1 proportion of the training data for the computation of the pseudo matrix, apart from Cai T and Swin on CIFAR100, where we used 0.09 proportion of the training data due to memory constraints.