Data-dependent initializations of Convolutional Neural Networks
Authors: Philipp Kraehenbuehl, Carl Doersch, Jeff Donahue, Trevor Darrell
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our initialization and all experiments in the open-source deep learning framework Caffe (Jia et al., 2014). To assess how easily a network can be fine-tuned with limited data, we use the classification and detection challenges in PASCAL VOC 2007 (Everingham et al., 2014), which contains 5011 images for training and 4952 for testing. ... Table 1: Classification performance of various initializations, training algorithms and with and without batch normalization (BN) on PASCAL VOC2007 for both random Gaussian (Gaus.) and kmeans (k-mns.) initialized weights. |
| Researcher Affiliation | Academia | Philipp Kr ahenb uhl1, Carl Doersch1,2, Jeff Donahue1, Trevor Darrell1 1Department of Electrical Engineering and Computer Science, UC Berkeley 2Machine Learning Department, Carnegie Mellon {philkr,jdonahue,trevor}@eecs.berkeley.edu; cdoersch@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Within-layer initialization. ... Algorithm 2 Between-layer normalization. |
| Open Source Code | Yes | Code available: https://github.com/philkr/magic_init |
| Open Datasets | Yes | To assess how easily a network can be fine-tuned with limited data, we use the classification and detection challenges in PASCAL VOC 2007 (Everingham et al., 2014)... |
| Dataset Splits | No | The paper states 'PASCAL VOC 2007... contains 5011 images for training and 4952 for testing' and mentions using '100 images of the VOC 2007 validation set' for a specific estimation. For ImageNet, it refers to 'training and validation sets' and plots validation loss, but does not specify the explicit sizes or percentages for validation splits for either dataset in a way that allows full reproduction of the splits. |
| Hardware Specification | Yes | The total training takes one hour on a Titan X GPU for Caffe Net. ... Training and evaluation took roughly 8 hours in a Titan X GPU for Caffe Net. |
| Software Dependencies | No | We implement our initialization and all experiments in the open-source deep learning framework Caffe (Jia et al., 2014). (No specific version numbers are provided for Caffe or any other software dependencies.) |
| Experiment Setup | Yes | We optimize each network via Stochastic Gradient Descent (SGD) for 80,000 iterations with an initial learning rate of 0.001 (dropped by 0.5 every 10,000 iterations), batch size of 10, and momentum of 0.9. |