Data-dependent initializations of Convolutional Neural Networks

Authors: Philipp Kraehenbuehl, Carl Doersch, Jeff Donahue, Trevor Darrell

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement our initialization and all experiments in the open-source deep learning framework Caffe (Jia et al., 2014). To assess how easily a network can be fine-tuned with limited data, we use the classification and detection challenges in PASCAL VOC 2007 (Everingham et al., 2014), which contains 5011 images for training and 4952 for testing. ... Table 1: Classification performance of various initializations, training algorithms and with and without batch normalization (BN) on PASCAL VOC2007 for both random Gaussian (Gaus.) and kmeans (k-mns.) initialized weights.
Researcher Affiliation Academia Philipp Kr ahenb uhl1, Carl Doersch1,2, Jeff Donahue1, Trevor Darrell1 1Department of Electrical Engineering and Computer Science, UC Berkeley 2Machine Learning Department, Carnegie Mellon {philkr,jdonahue,trevor}@eecs.berkeley.edu; cdoersch@cs.cmu.edu
Pseudocode Yes Algorithm 1 Within-layer initialization. ... Algorithm 2 Between-layer normalization.
Open Source Code Yes Code available: https://github.com/philkr/magic_init
Open Datasets Yes To assess how easily a network can be fine-tuned with limited data, we use the classification and detection challenges in PASCAL VOC 2007 (Everingham et al., 2014)...
Dataset Splits No The paper states 'PASCAL VOC 2007... contains 5011 images for training and 4952 for testing' and mentions using '100 images of the VOC 2007 validation set' for a specific estimation. For ImageNet, it refers to 'training and validation sets' and plots validation loss, but does not specify the explicit sizes or percentages for validation splits for either dataset in a way that allows full reproduction of the splits.
Hardware Specification Yes The total training takes one hour on a Titan X GPU for Caffe Net. ... Training and evaluation took roughly 8 hours in a Titan X GPU for Caffe Net.
Software Dependencies No We implement our initialization and all experiments in the open-source deep learning framework Caffe (Jia et al., 2014). (No specific version numbers are provided for Caffe or any other software dependencies.)
Experiment Setup Yes We optimize each network via Stochastic Gradient Descent (SGD) for 80,000 iterations with an initial learning rate of 0.001 (dropped by 0.5 every 10,000 iterations), batch size of 10, and momentum of 0.9.