Very Deep Convolutional Networks for Large-Scale Image Recognition
Authors: Karen Simonyan and Andrew Zisserman
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16 19 weight layers. These findings were the basis of our Image Net Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. |
| Researcher Affiliation | Collaboration | Karen Simonyan & Andrew Zisserman+ Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk current affiliation: Google Deep Mind +current affiliation: University of Oxford and Google Deep Mind |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have made our two best-performing Conv Net models publicly available to facilitate further research on the use of deep visual representations in computer vision. 1http://www.robots.ox.ac.uk/~vgg/research/very_deep/ |
| Open Datasets | Yes | The dataset includes images of 1000 classes, and is split into three sets: training (1.3M images), validation (50K images), and testing (100K images with held-out class labels). |
| Dataset Splits | Yes | The dataset includes images of 1000 classes, and is split into three sets: training (1.3M images), validation (50K images), and testing (100K images with held-out class labels). |
| Hardware Specification | Yes | On a system equipped with four NVIDIA Titan Black GPUs, training a single net took 2 3 weeks depending on the architecture. |
| Software Dependencies | Yes | Our implementation is derived from the publicly available C++ Caffe toolbox (Jia, 2013) (branched out in December 2013) |
| Experiment Setup | Yes | The batch size was set to 256, momentum to 0.9. The training was regularised by weight decay (the L2 penalty multiplier set to 5 10 4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5). The learning rate was initially set to 10 2, and then decreased by a factor of 10 when the validation set accuracy stopped improving. In total, the learning rate was decreased 3 times, and the learning was stopped after 370K iterations (74 epochs). ... We consider two approaches for setting the training scale S. The first is to fix S, which corresponds to single-scale training... In our experiments, we evaluated models trained at two fixed scales: S = 256... and S = 384. |