Data-Dependent Stability of Stochastic Gradient Descent
Authors: Ilja Kuzborskij, Christoph Lampert
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further corroborate our findings empirically and show that, indeed, the data-dependent generalization bound is tighter than the worst-case counterpart on non-convex objective functions. and Next we empirically assess the tightness of our non-convex generalization bounds on real data. |
| Researcher Affiliation | Academia | 1University of Milan, Italy 2IST Austria. |
| Pseudocode | No | No explicit pseudocode or algorithm block was found. The paper describes the SGD update rule in a textual format but does not present a structured algorithm. |
| Open Source Code | No | No concrete access to source code (e.g., repository link, explicit statement of code release, or code in supplementary materials) for the described methodology was found. |
| Open Datasets | Yes | Next we empirically assess the tightness of our non-convex generalization bounds on real data. In the following experiment we train a neural network with three convolutional layers interlaced with max-pooling, followed by the fully connected layer with 16 units, on the MNIST dataset. |
| Dataset Splits | No | The paper mentions 'validation and training average losses' which implies the use of validation data, but no specific dataset split information (percentages, sample counts, or explicit splitting methodology) is provided for reproducibility. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were mentioned in the paper. |
| Experiment Setup | No | The paper describes the neural network architecture (three convolutional layers interlaced with max-pooling, followed by the fully connected layer with 16 units) and mentions using 'multiple warm-starts' (7 warm-starts at every 200 steps). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) which are crucial for full reproducibility of the experimental setup. |