Automatic Discovery and Optimization of Parts for Image Classification

Authors: Sobhan Naderi Parizi, Andrea Vedaldi, Andrew Zisserman, and Pedro Felzenszwalb

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments with both HOG (Dalal & Triggs (2005)) and CNN (Krizhevsky et al. (2012)) features and improve the state-of-the-art results on the MIT-indoor dataset (Quattoni & Torralba (2009)) using CNN features.
Researcher Affiliation Academia Brown University University of Oxford Brown University
Pseudocode Yes Algorithm 1 Joint training of model parameters by optimizing O(u, w) in Equation 6. Algorithm 2 Fast optimization of the convex bound Bu(w, wold) using hard example mining. Algorithm 3 Fast QP solver for optimizing BC.
Open Source Code No The paper mentions using a third-party tool, Caffe, but it does not provide an explicit statement about releasing its own source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We evaluate our methods on the MIT-indoor dataset (Quattoni & Torralba (2009)). The hybrid network is pre-trained on images from Image Net (Deng et al. (2009)) and PLACES (Zhou et al. (2014)) datasets.
Dataset Splits No The paper states: "The dataset has 67 indoor scene classes. There are about 80 training and 20 test images per class." While it mentions training and test sets, it does not specify a separate validation split or explicit percentages/counts for data partitioning, nor does it refer to predefined splits with citations.
Hardware Specification Yes In our current implementation it takes 5 days to do joint training with 120 shared parts on the full MIT-indoor dataset on a 16-core machine using HOG features. It takes 2.5 days to do joint training with 372 parts on the full dataset on a 8 core machine using 60-dimensional PCA-reduced CNN features.
Software Dependencies No The paper states: "We extract CNN features using Caffe (Jia et al. (2014))." It mentions Caffe, but does not provide a specific version number for this or any other software dependency.
Experiment Setup Yes HOG features: We resize images (maintaining aspect ratio) to have about 2.5M pixels. We extract 32-dimensional HOG features... at multiple scales. Our HOG pyramid has 3 scales per octave... Each part filter wj models a 6 6 grid of HOG features... CNN features: We extract CNN features at multiple scales from overlapping patches of fixed size 256 256 and with stride value 256/3 = 85. We resize images (maintaining aspect ratio) to have about 5M pixels in the largest scale. We use a scale pyramid with 2 scales per octave.