Regularizing CNNs with Locally Constrained Decorrelations

Authors: Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we hypothesize that regularizing negatively correlated features is an obstacle for achieving better results and we introduce Orho Reg, a novel regularization technique that addresses the performance margin issue by only regularizing positively correlated feature weights. Moreover, Ortho Reg is computationally efficient since it only regularizes the feature weights, which makes it very suitable for the latest CNN models. We verify our hypothesis through a series of experiments: first using MNIST as a proof of concept, secondly we regularize wide residual networks on CIFAR-10, CIFAR-100, and SVHN (Netzer et al. (2011)) achieving the lowest error rates in the dataset to the best of our knowledge.
Researcher Affiliation Collaboration Pau Rodr ıguez , Jordi Gonz alez , , Guillem Cucurull , Josep M. Gonfaus , Xavier Roca , Computer Vision Center Univ. Aut onoma de Barcelona (UAB), 08193 Bellaterra, Catalonia Spain Visual Tagging Services, Campus UAB, 08193 Bellaterra, Catalonia Spain
Pseudocode Yes Algorithm 1 Orthogonal Regularization Step.
Open Source Code No Our code is based in the train-a-digit-classifier example included in torch/demos1, which uses an upsampled version of the dataset (32 32). The only pre-processing applied to the data is a global standardization. The model is trained with SGD and a batch size of 200 during 200 epochs. No momentum neither weight decay was applied. By default, the magnitude of the weights of this experiments is recovered after each regularization step in order to prove the regularization only affects their angle. 1https://github.com/torch/demos
Open Datasets Yes first using MNIST as a proof of concept, secondly we regularize wide residual networks on CIFAR-10, CIFAR-100, and SVHN (Netzer et al. (2011)) achieving the lowest error rates in the dataset to the best of our knowledge. ... we first train a three-hidden-layer Multi-Layer Perceptron (MLP) with Re LU non-liniarities on the MNIST dataset (Le Cun et al. (1998)).
Dataset Splits Yes Figure 4: (a) The evolution of the error rate on the MNIST validation set for different regularization magnitudes. It can be seen that for γ = 1 it reaches the best error rate (1.45%) while the unregularized counterpart (γ = 0) is 1.74%.
Hardware Specification Yes We also gratefully acknowledge the support of NVIDIA Corporation with the donation of a Tesla K40 GPU and a GTX TITAN GPU, used for this research.
Software Dependencies No Our code is based in the train-a-digit-classifier example included in torch/demos1 ... The experiment is based on a torch implementation of the 28-layer and 10th width factor wide deep residual model ... No specific version numbers are provided for Torch or any other libraries.
Experiment Setup Yes The model is trained with SGD and a batch size of 200 during 200 epochs. No momentum neither weight decay was applied. By default, the magnitude of the weights of this experiments is recovered after each regularization step in order to prove the regularization only affects their angle. ... Figure 4a shows that the model effectively achieves the best error rate for the highest gamma value (γ = 1), thus proving the advantages of the regularization. ... The regularization coefficient γ was chosen using grid search although similar values were found for all the experiments, specially if regularization gradients are normalized before adding them to the weights.