Regularizing CNNs with Locally Constrained Decorrelations
Authors: Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we hypothesize that regularizing negatively correlated features is an obstacle for achieving better results and we introduce Orho Reg, a novel regularization technique that addresses the performance margin issue by only regularizing positively correlated feature weights. Moreover, Ortho Reg is computationally efficient since it only regularizes the feature weights, which makes it very suitable for the latest CNN models. We verify our hypothesis through a series of experiments: first using MNIST as a proof of concept, secondly we regularize wide residual networks on CIFAR-10, CIFAR-100, and SVHN (Netzer et al. (2011)) achieving the lowest error rates in the dataset to the best of our knowledge. |
| Researcher Affiliation | Collaboration | Pau Rodr ıguez , Jordi Gonz alez , , Guillem Cucurull , Josep M. Gonfaus , Xavier Roca , Computer Vision Center Univ. Aut onoma de Barcelona (UAB), 08193 Bellaterra, Catalonia Spain Visual Tagging Services, Campus UAB, 08193 Bellaterra, Catalonia Spain |
| Pseudocode | Yes | Algorithm 1 Orthogonal Regularization Step. |
| Open Source Code | No | Our code is based in the train-a-digit-classifier example included in torch/demos1, which uses an upsampled version of the dataset (32 32). The only pre-processing applied to the data is a global standardization. The model is trained with SGD and a batch size of 200 during 200 epochs. No momentum neither weight decay was applied. By default, the magnitude of the weights of this experiments is recovered after each regularization step in order to prove the regularization only affects their angle. 1https://github.com/torch/demos |
| Open Datasets | Yes | first using MNIST as a proof of concept, secondly we regularize wide residual networks on CIFAR-10, CIFAR-100, and SVHN (Netzer et al. (2011)) achieving the lowest error rates in the dataset to the best of our knowledge. ... we first train a three-hidden-layer Multi-Layer Perceptron (MLP) with Re LU non-liniarities on the MNIST dataset (Le Cun et al. (1998)). |
| Dataset Splits | Yes | Figure 4: (a) The evolution of the error rate on the MNIST validation set for different regularization magnitudes. It can be seen that for γ = 1 it reaches the best error rate (1.45%) while the unregularized counterpart (γ = 0) is 1.74%. |
| Hardware Specification | Yes | We also gratefully acknowledge the support of NVIDIA Corporation with the donation of a Tesla K40 GPU and a GTX TITAN GPU, used for this research. |
| Software Dependencies | No | Our code is based in the train-a-digit-classifier example included in torch/demos1 ... The experiment is based on a torch implementation of the 28-layer and 10th width factor wide deep residual model ... No specific version numbers are provided for Torch or any other libraries. |
| Experiment Setup | Yes | The model is trained with SGD and a batch size of 200 during 200 epochs. No momentum neither weight decay was applied. By default, the magnitude of the weights of this experiments is recovered after each regularization step in order to prove the regularization only affects their angle. ... Figure 4a shows that the model effectively achieves the best error rate for the highest gamma value (γ = 1), thus proving the advantages of the regularization. ... The regularization coefficient γ was chosen using grid search although similar values were found for all the experiments, specially if regularization gradients are normalized before adding them to the weights. |