reproducibilityindex.ai

Regularizing CNNs with Locally Constrained Decorrelations

Authors: Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we hypothesize that regularizing negatively correlated features is an obstacle for achieving better results and we introduce Orho Reg, a novel regularization technique that addresses the performance margin issue by only regularizing positively correlated feature weights. Moreover, Ortho Reg is computationally efﬁcient since it only regularizes the feature weights, which makes it very suitable for the latest CNN models. We verify our hypothesis through a series of experiments: ﬁrst using MNIST as a proof of concept, secondly we regularize wide residual networks on CIFAR-10, CIFAR-100, and SVHN (Netzer et al. (2011)) achieving the lowest error rates in the dataset to the best of our knowledge.
Researcher Affiliation	Collaboration	Pau Rodr ıguez , Jordi Gonz alez , , Guillem Cucurull , Josep M. Gonfaus , Xavier Roca , Computer Vision Center Univ. Aut onoma de Barcelona (UAB), 08193 Bellaterra, Catalonia Spain Visual Tagging Services, Campus UAB, 08193 Bellaterra, Catalonia Spain
Pseudocode	Yes	Algorithm 1 Orthogonal Regularization Step.
Open Source Code	No	Our code is based in the train-a-digit-classifier example included in torch/demos1, which uses an upsampled version of the dataset (32 32). The only pre-processing applied to the data is a global standardization. The model is trained with SGD and a batch size of 200 during 200 epochs. No momentum neither weight decay was applied. By default, the magnitude of the weights of this experiments is recovered after each regularization step in order to prove the regularization only affects their angle. 1https://github.com/torch/demos
Open Datasets	Yes	first using MNIST as a proof of concept, secondly we regularize wide residual networks on CIFAR-10, CIFAR-100, and SVHN (Netzer et al. (2011)) achieving the lowest error rates in the dataset to the best of our knowledge. ... we ﬁrst train a three-hidden-layer Multi-Layer Perceptron (MLP) with Re LU non-liniarities on the MNIST dataset (Le Cun et al. (1998)).
Dataset Splits	Yes	Figure 4: (a) The evolution of the error rate on the MNIST validation set for different regularization magnitudes. It can be seen that for γ = 1 it reaches the best error rate (1.45%) while the unregularized counterpart (γ = 0) is 1.74%.
Hardware Specification	Yes	We also gratefully acknowledge the support of NVIDIA Corporation with the donation of a Tesla K40 GPU and a GTX TITAN GPU, used for this research.
Software Dependencies	No	Our code is based in the train-a-digit-classifier example included in torch/demos1 ... The experiment is based on a torch implementation of the 28-layer and 10th width factor wide deep residual model ... No specific version numbers are provided for Torch or any other libraries.
Experiment Setup	Yes	The model is trained with SGD and a batch size of 200 during 200 epochs. No momentum neither weight decay was applied. By default, the magnitude of the weights of this experiments is recovered after each regularization step in order to prove the regularization only affects their angle. ... Figure 4a shows that the model effectively achieves the best error rate for the highest gamma value (γ = 1), thus proving the advantages of the regularization. ... The regularization coefﬁcient γ was chosen using grid search although similar values were found for all the experiments, specially if regularization gradients are normalized before adding them to the weights.