reproducibilityindex.ai

Reducing Overfitting in Deep Networks by Decorrelating Representations

Authors: Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across a range of datasets and network architectures show that this loss always reduces overﬁtting (as indicated by the difference between train and val performance), and better generalization.
Researcher Affiliation	Collaboration	Michael Cogswell Virginia Tech Blacksburg, VA cogswell@vt.edu Faruk Ahmed Université de Montréal Montréal, Quebec, Canada faruk.ahmed@umontreal.ca Ross Girshick Facebook AI Research (FAIR) Seattle, WA rbg@fb.com Larry Zitnick Microsoft Research Seattle, WA larryz@microsoft.com Dhruv Batra Virginia Tech Blacksburg, VA dbatra@vt.edu
Pseudocode	No	The paper describes the mathematical formulation of the De Cov loss but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using Caffe implementations and a Caffe Model Zoo link (https://gist.github.com/mavenlin/d802a5849de39225bcc6) for existing models, but it does not state that the authors' own implementation of the De Cov method is open-source or provide a link to their code.
Open Datasets	Yes	Our experiments encompass a range of datasets (MNIST (Le Cun et al., 1995), CIFAR10/100 (Krizhevsky & Hinton, 2009), Image Net (Deng et al., 2009))
Dataset Splits	Yes	Hyper-parameters (loss weights for De Cov and weight decay) are chosen by grid search on the standard train/val split. (CIFAR10) ... We use the same architecture as the base architecture for CIFAR10 and hold out the last 10,000 of the 50,000 train examples for validation. (CIFAR100) ... The last 50,000 of the ILSVRC 2012 train images are held out for validation. (ImageNet)
Hardware Specification	No	The paper mentions 'Faster computers' and 'GPU support by NVIDIA' in general terms, and states 'Using Cu DNNv3, Alex Net with 128x128 inputs takes 103ms averaged over 50 runs to compute a forward and backward pass.' (Footnote 1) but does not provide specific CPU or GPU models used for the experiments.
Software Dependencies	Yes	Using Cu DNNv3, Alex Net with 128x128 inputs takes 103ms averaged over 50 runs to compute a forward and backward pass. (Footnote 1)
Experiment Setup	Yes	Note that we set the Dropout rate to 0.5 as suggested by Srivastava et al. (2014). ... Hyper-parameters (loss weights for De Cov and weight decay) are chosen by grid search on the standard train/val split. ... The best De Cov weight (0.1) is consistent for a range of hidden activation sizes in this dataset... Our implementation comes from Caffe. In particular, it uses a ﬁxed schedule that multiplies the learning rate by 1/10 every 100,000 iterations... We do not use early stopping and do not perform color augmentation.