Data-Dependent Stability of Stochastic Gradient Descent

Authors: Ilja Kuzborskij, Christoph Lampert

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further corroborate our findings empirically and show that, indeed, the data-dependent generalization bound is tighter than the worst-case counterpart on non-convex objective functions. and Next we empirically assess the tightness of our non-convex generalization bounds on real data.
Researcher Affiliation Academia 1University of Milan, Italy 2IST Austria.
Pseudocode No No explicit pseudocode or algorithm block was found. The paper describes the SGD update rule in a textual format but does not present a structured algorithm.
Open Source Code No No concrete access to source code (e.g., repository link, explicit statement of code release, or code in supplementary materials) for the described methodology was found.
Open Datasets Yes Next we empirically assess the tightness of our non-convex generalization bounds on real data. In the following experiment we train a neural network with three convolutional layers interlaced with max-pooling, followed by the fully connected layer with 16 units, on the MNIST dataset.
Dataset Splits No The paper mentions 'validation and training average losses' which implies the use of validation data, but no specific dataset split information (percentages, sample counts, or explicit splitting methodology) is provided for reproducibility.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned in the paper.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were mentioned in the paper.
Experiment Setup No The paper describes the neural network architecture (three convolutional layers interlaced with max-pooling, followed by the fully connected layer with 16 units) and mentions using 'multiple warm-starts' (7 warm-starts at every 200 steps). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) which are crucial for full reproducibility of the experimental setup.