Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience
Authors: Vaishnavh Nagarajan, Zico Kolter
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1, we show how the terms in the bound vary for networks of varying depth with a small width of H = 40 on the MNIST dataset. We observe that Blayer-ℓ2,Boutput,Bjac-row-ℓ2,Bjac-spec typically lie in the range of [100,102] and scale with depth as 1.57D. |
| Researcher Affiliation | Collaboration | Vaishnavh Nagarajan Department of Computer Science Carnegie Mellon University Pittsburgh, PA vaishnavh@cs.cmu.edu J. Zico Kolter Department of Computer Science Carnegie Mellon University & Bosch Center for AI Pittsburgh, PA zkolter@cs.cmu.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | In Figure 1, we show how the terms in the bound vary for networks of varying depth with a small width of H = 40 on the MNIST dataset. |
| Dataset Splits | No | The paper mentions training on a subset of the MNIST dataset but does not explicitly detail training, validation, and test splits with specific percentages or counts. For example, it does not mention a separate validation set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. It mentions running experiments with networks of varying depths and widths (e.g., H=40, H=1280) but no specific GPU, CPU, or other hardware details are provided. |
| Software Dependencies | No | The paper mentions using 'SGD with learning rate 0.1 and mini-batch size 64' and 'Adam with a learning rate of 10-5' as optimization algorithms, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication. |
| Experiment Setup | Yes | In all the experiments, including the ones in the main paper (except the one in Figure 2 (b)) we use SGD with learning rate 0.1 and mini-batch size 64. We train the network on a subset of 4096 random training examples from the MNIST dataset to minimize cross entropy loss. We stop training when we classify at least 0.99 of the data perfectly, with a margin of γclass = 10. In Figure 2 (b) where we train networks of depth D = 28, the above training algorithm is quite unstable. Instead, we use Adam with a learning rate of 10 5 until the network achieves an accuracy of 0.95 on the training dataset. |