Reducing Overfitting in Deep Networks by Decorrelating Representations
Authors: Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across a range of datasets and network architectures show that this loss always reduces overfitting (as indicated by the difference between train and val performance), and better generalization. |
| Researcher Affiliation | Collaboration | Michael Cogswell Virginia Tech Blacksburg, VA cogswell@vt.edu Faruk Ahmed Université de Montréal Montréal, Quebec, Canada faruk.ahmed@umontreal.ca Ross Girshick Facebook AI Research (FAIR) Seattle, WA rbg@fb.com Larry Zitnick Microsoft Research Seattle, WA larryz@microsoft.com Dhruv Batra Virginia Tech Blacksburg, VA dbatra@vt.edu |
| Pseudocode | No | The paper describes the mathematical formulation of the De Cov loss but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using Caffe implementations and a Caffe Model Zoo link (https://gist.github.com/mavenlin/d802a5849de39225bcc6) for existing models, but it does not state that the authors' own implementation of the De Cov method is open-source or provide a link to their code. |
| Open Datasets | Yes | Our experiments encompass a range of datasets (MNIST (Le Cun et al., 1995), CIFAR10/100 (Krizhevsky & Hinton, 2009), Image Net (Deng et al., 2009)) |
| Dataset Splits | Yes | Hyper-parameters (loss weights for De Cov and weight decay) are chosen by grid search on the standard train/val split. (CIFAR10) ... We use the same architecture as the base architecture for CIFAR10 and hold out the last 10,000 of the 50,000 train examples for validation. (CIFAR100) ... The last 50,000 of the ILSVRC 2012 train images are held out for validation. (ImageNet) |
| Hardware Specification | No | The paper mentions 'Faster computers' and 'GPU support by NVIDIA' in general terms, and states 'Using Cu DNNv3, Alex Net with 128x128 inputs takes 103ms averaged over 50 runs to compute a forward and backward pass.' (Footnote 1) but does not provide specific CPU or GPU models used for the experiments. |
| Software Dependencies | Yes | Using Cu DNNv3, Alex Net with 128x128 inputs takes 103ms averaged over 50 runs to compute a forward and backward pass. (Footnote 1) |
| Experiment Setup | Yes | Note that we set the Dropout rate to 0.5 as suggested by Srivastava et al. (2014). ... Hyper-parameters (loss weights for De Cov and weight decay) are chosen by grid search on the standard train/val split. ... The best De Cov weight (0.1) is consistent for a range of hidden activation sizes in this dataset... Our implementation comes from Caffe. In particular, it uses a fixed schedule that multiplies the learning rate by 1/10 every 100,000 iterations... We do not use early stopping and do not perform color augmentation. |