Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Which Algorithms Have Tight Generalization Bounds?

Authors: Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical findings, presented in Section L, suggest that neural networks are indeed quite stable. To substantiate this intuition, we conduct simple preliminary experiments to estimate the the stability of neural networks in practice.
Researcher Affiliation	Academia	Michael Gastpar EPFL EMAIL Ido Nachum University of Haifa EMAIL Jonathan Shafer MIT EMAIL Thomas Weinberger EPFL EMAIL
Pseudocode	No	The paper describes algorithms conceptually and refers to external algorithms, but it does not contain a structured pseudocode block or algorithm steps as a figure or dedicated section within its own content.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: the experiments are elementary; nevertheless, if reviewers wish to see the code, we can provide it.
Open Datasets	Yes	Here, we examine if there are practical algorithms that admit loss stability or even hypothesis stability with substantial numerical values. To this end, we conduct experiments over a simple neural network architecture across four datasets: MNIST, Fashion MNIST, CIFAR10, and CIFAR10 with random labels (figures 1-4, respectively).
Dataset Splits	No	The training procedure is as follows: we train two models in tandem, starting from the same random initialization. The first model is provided with the full training set, whereas the second model has k = 100 data points removed from its training set. These points are drawn uniformly at random before the beginning of the training, and fixed thereafter.
Hardware Specification	No	Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] . Justification: Very simple experiments executed on a laptop, no special resources needed.
Software Dependencies	No	We train the models using stochastic gradient descent (SGD) with a momentum factor of 0.9 and a batch size of 1000, optimizing the cross-entropy loss.
Experiment Setup	Yes	Throughout all experiments, we employ one-hidden-layer perceptrons with 512 hidden neurons. We train the models using stochastic gradient descent (SGD) with a momentum factor of 0.9 and a batch size of 1000, optimizing the cross-entropy loss. For every data set, we train the models across learning rates 0.1, 0.035, and 0.01. We average all the curves over 10 random seeds (tied for the pairs of networks) and plot the standard deviation for all the curves. The number of training epochs, which is {50, 150, 150, 300} for {MNIST, FMNIST, CIFAR10, CIFAR10 random}, respectively.