Traditional and Heavy Tailed Self Regularization in Neural Network Models
Authors: Michael Mahoney, Charles Martin
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Main Empirical Results. Our main empirical results consist in evaluating empirically the ESDs (and related RMT-based statistics) for weight matrices for a suite of DNN models, thereby probing the Energy Landscapes of these DNNs. |
| Researcher Affiliation | Collaboration | 1Calculation Consulting, 8 Locksley Ave, 6B, San Francisco, CA 94122 2ICSI and Department of Statistics, University of California at Berkeley, Berkeley, CA 94720. Correspondence to: Charles H. Martin <charles@Calculation Consulting.com>, Michael W. Mahoney <mmahoney@stat.berkeley.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper links to a third-party GitHub repository (https://github.com/deepmind/sonnet/blob/master/sonnet/python/modules/nets/alexnet.py) for the Mini Alex Net architecture, but there is no explicit statement by the authors that they are releasing their own source code for the methodology described in this paper. |
| Open Datasets | Yes | We used Keras 2.0, using 20 epochs of the Ada Delta optimizer, on the MNIST data set. |
| Dataset Splits | No | The paper mentions '100.00% training accuracy, and 99.25% test accuracy on the default MNIST split' and 'Training and Test Accuracies' for Mini Alex Net. While training and test splits are implied, specific details about a validation split (percentages, counts, or explicit mention of a validation set) are not provided. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | Yes | We used Keras 2.0, with Tensor Flow as a backend. |
| Experiment Setup | Yes | All models are trained using Keras 2.x, with Tensor Flow as a backend. We use SGD with momentum, with a learning rate of 0.01, a momentum parameter of 0.9, and a baseline batch size of 32; and we train up to 100 epochs. |