reproducibilityindex.ai

Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization

Authors: Rohan Ghosh, Mehul Motani

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we propose, in this work, a novel measure of complexity called Kolmogorov Growth (KG), which we use to derive new generalization error bounds that only depend on the ﬁnal choice of the classiﬁcation function. Guided by the bounds, we propose a novel way of regularizing neural networks by constraining the network trajectory to remain in the low KG zone during training. Minimizing KG while learning is akin to applying the Occam s razor to neural networks. The proposed approach, called network-to-network regularization, leads to clear improvements in the generalization ability of classiﬁers. We verify this for three popular image datasets (MNIST, CIFAR-10, CIFAR-100) across varying training data sizes.
Researcher Affiliation	Academia	Rohan Ghosh and Mehul Motani Department of Electrical and Computer Engineering N.1 Institute for Health Institute of Data Science National University of Singapore rghosh92@gmail.com, motani@nus.edu.sg
Pseudocode	Yes	Algorithm 1 N2N Regularization (Multi-Level)
Open Source Code	Yes	Code will be made available at https://github.com/rghosh92/N2N.
Open Datasets	Yes	We test N2N on three datasets: MNIST [17], CIFAR-10 [18] and CIFAR-100 [19].
Dataset Splits	Yes	For the CIFAR-100 datasets, we report the accuracies using a 48k-2k training-validation split of the data for both, as we ﬁnd it to yield best performance (due to hard convergence).
Hardware Specification	Yes	Experiments were either carried out on an RTX 2060 GPU or a Tesla V100 or A100 GPU.
Software Dependencies	No	The paper mentions using specific network architectures (e.g., ResNet44, ResNet50, 5-layer CNN) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	All networks were trained for a total of 200 iterations, and in each case results reported are averaged over ﬁve networks. For all experiments we set ebase = 3, esmall = 1 in Algorithm 1. The values of the regularization parameters (λ0, λ1) are provided in the supplementary material.