Stronger Generalization Bounds for Deep Nets via a Compression Approach
Authors: Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All are empirically studied, including their correlation with generalization (Section 6). Experiments were performed by training a VGG-19 architecture (Simonyan and Zisserman, 2014) and a Alex Net (Krizhevsky et al., 2012) for multi-class classification task on CIFAR-10 dataset. |
| Researcher Affiliation | Academia | 1Princeton University, Computer Science Department 2Duke University, Computer Science Department 3Institute for Advanced Study, School of Mathematics. |
| Pseudocode | Yes | Algorithm 1 Matrix-Project (A, ε, η) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is open source or publicly available. |
| Open Datasets | Yes | Experiments were performed by training a VGG-19 architecture (Simonyan and Zisserman, 2014) and a Alex Net (Krizhevsky et al., 2012) for multi-class classification task on CIFAR-10 dataset. |
| Dataset Splits | No | The paper mentions '92.45% validation accuracy' for VGG-19, indicating the use of a validation set. However, it does not provide specific details on the split percentages or sample counts for this validation set, which are necessary for full reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimization techniques like 'SGD with mini-batch size 128', but it does not list any specific software or library names with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0'). |
| Experiment Setup | Yes | Optimization used SGD with mini-batch size 128, weight decay 5e-4, momentum 0.9 and initial learning rate 0.05, but decayed by factor 2 every 30 epochs. Drop-out was used in fully-connected layers. |