reproducibilityindex.ai

Data-Dependent Stability of Stochastic Gradient Descent

Authors: Ilja Kuzborskij, Christoph Lampert

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further corroborate our ﬁndings empirically and show that, indeed, the data-dependent generalization bound is tighter than the worst-case counterpart on non-convex objective functions. and Next we empirically assess the tightness of our non-convex generalization bounds on real data.
Researcher Affiliation	Academia	1University of Milan, Italy 2IST Austria.
Pseudocode	No	No explicit pseudocode or algorithm block was found. The paper describes the SGD update rule in a textual format but does not present a structured algorithm.
Open Source Code	No	No concrete access to source code (e.g., repository link, explicit statement of code release, or code in supplementary materials) for the described methodology was found.
Open Datasets	Yes	Next we empirically assess the tightness of our non-convex generalization bounds on real data. In the following experiment we train a neural network with three convolutional layers interlaced with max-pooling, followed by the fully connected layer with 16 units, on the MNIST dataset.
Dataset Splits	No	The paper mentions 'validation and training average losses' which implies the use of validation data, but no specific dataset split information (percentages, sample counts, or explicit splitting methodology) is provided for reproducibility.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were mentioned in the paper.
Experiment Setup	No	The paper describes the neural network architecture (three convolutional layers interlaced with max-pooling, followed by the fully connected layer with 16 units) and mentions using 'multiple warm-starts' (7 warm-starts at every 200 steps). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) which are crucial for full reproducibility of the experimental setup.