Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
Authors: Wieland Brendel, Matthias Bethge
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model, a simple variant of the Res Net50 architecture called Bag Net, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (Bo F) models popular before the onset of deep learning and reaches a surprisingly high accuracy on Image Net (87.6% top-5 for 33 33 px features and Alexnet performance for 17 17 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the Bag Nets behave similar to state-of-the art deep neural networks such as VGG-16, Res Net-152 or Dense Net-169 in terms of feature sensitivity, error distribution and interactions between image parts. |
| Researcher Affiliation | Academia | Wieland Brendel and Matthias Bethge Eberhard Karls University of Tübingen, Germany Werner Reichardt Centre for Integrative Neuroscience, Tübingen, Germany Bernstein Center for Computational Neuroscience, Tübingen, Germany {wieland.brendel, matthias.bethge}@bethgelab.org |
| Pseudocode | No | The paper includes a figure detailing the network architecture but no formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We released the pretrained Bag Nets (Bag Net-9, Bag Net17 and Bag Net-33) for Py Torch and Keras at https://github.com/wielandbrendel/bag-of-local-features-models. |
| Open Datasets | Yes | We train the Bag Nets directly on Image Net (see Appendix for details). |
| Dataset Splits | Yes | Training of the models was performed in Py Torch using the default Image Net training script of Torchvision (https://github.com/ pytorch/vision, commit 8a4786a) with default parameters. In brief, we used SGD with momentum (0.9), a batchsize of 256 and an initial learning rate of 0.01 which we decreased by a factor of 10 every 30 epochs. Images were resized to 256 pixels (shortest side) after which we extracted a random crop of size 224 224 pixels. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper mentions "Py Torch" and "Keras" and refers to a "Torchvision" commit, but does not specify version numbers for these key software components. |
| Experiment Setup | Yes | In brief, we used SGD with momentum (0.9), a batchsize of 256 and an initial learning rate of 0.01 which we decreased by a factor of 10 every 30 epochs. |