Spectrally-normalized margin bounds for neural networks

Authors: Peter L. Bartlett, Dylan J. Foster, Matus J. Telgarsky

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical investigation, in Section 2, of neural network generalization on the standard datasets cifar10, cifar100, and mnist using the preceding bound.
Researcher Affiliation Academia <peter@berkeley.edu>; University of California, Berkeley and Queensland University of Technology. <djf244@cornell.edu>; Cornell University. <mjt@illinois.edu>; University of Illinois, Urbana-Champaign.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions implementing experiments in Keras and citing a Keras reference, but does not provide a link or statement for its own open-source code.
Open Datasets Yes An empirical investigation, in Section 2, of neural network generalization on the standard datasets cifar10, cifar100, and mnist using the preceding bound.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits with percentages or sample counts, nor does it reference predefined splits with citations for reproducibility.
Hardware Specification No M.T. and D.F. acknowledge the use of a GPU machine provided by Karthik Sridharan and made possible by an NVIDIA GPU grant. This mentions a GPU but lacks specific model numbers or detailed specifications.
Software Dependencies No All experiments were implemented in Keras [Chollet et al., 2015]. The paper mentions Keras but does not specify a version number for it, which is needed for reproducibility.
Experiment Setup Yes All experiments were implemented in Keras [Chollet et al., 2015]. In order to minimize conflating effects of optimization and regularization, the optimization method was vanilla SGD with step size 0.01, and all regularization (weight decay, batch normalization, etc.) were disabled. cifar in general refers to cifar10, however cifar100 will also be explicitly mentioned. The network architecture is essentially Alex Net [Krizhevsky et al., 2012] with all normalization/regularization removed, and with no adjustments of any kind (even to the learning rate) across the different experiments.