Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization
Authors: Rohan Ghosh, Mehul Motani
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we propose, in this work, a novel measure of complexity called Kolmogorov Growth (KG), which we use to derive new generalization error bounds that only depend on the final choice of the classification function. Guided by the bounds, we propose a novel way of regularizing neural networks by constraining the network trajectory to remain in the low KG zone during training. Minimizing KG while learning is akin to applying the Occam s razor to neural networks. The proposed approach, called network-to-network regularization, leads to clear improvements in the generalization ability of classifiers. We verify this for three popular image datasets (MNIST, CIFAR-10, CIFAR-100) across varying training data sizes. |
| Researcher Affiliation | Academia | Rohan Ghosh and Mehul Motani Department of Electrical and Computer Engineering N.1 Institute for Health Institute of Data Science National University of Singapore rghosh92@gmail.com, motani@nus.edu.sg |
| Pseudocode | Yes | Algorithm 1 N2N Regularization (Multi-Level) |
| Open Source Code | Yes | Code will be made available at https://github.com/rghosh92/N2N. |
| Open Datasets | Yes | We test N2N on three datasets: MNIST [17], CIFAR-10 [18] and CIFAR-100 [19]. |
| Dataset Splits | Yes | For the CIFAR-100 datasets, we report the accuracies using a 48k-2k training-validation split of the data for both, as we find it to yield best performance (due to hard convergence). |
| Hardware Specification | Yes | Experiments were either carried out on an RTX 2060 GPU or a Tesla V100 or A100 GPU. |
| Software Dependencies | No | The paper mentions using specific network architectures (e.g., ResNet44, ResNet50, 5-layer CNN) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All networks were trained for a total of 200 iterations, and in each case results reported are averaged over five networks. For all experiments we set ebase = 3, esmall = 1 in Algorithm 1. The values of the regularization parameters (λ0, λ1) are provided in the supplementary material. |