Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Authors: Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan Yuille

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We test our Deep Lab model on the PASCAL VOC 2012 segmentation benchmark (Everingham et al., 2014), consisting of 20 foreground object classes and one background class.
Researcher Affiliation Collaboration Liang-Chieh Chen Univ. of California, Los Angeles lcchen@cs.ucla.edu George Papandreou Google Inc. gpapan@google.com Iasonas Kokkinos Centrale Sup elec and INRIA iasonas.kokkinos@ecp.fr Kevin Murphy Google Inc. kpmurphy@google.com Alan L. Yuille Univ. of California, Los Angeles yuille@stat.ucla.edu
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes We share our source code, configuration files, and trained models that allow reproducing the results in this paper at a companion web site https://bitbucket.org/deeplab/deeplab-public.
Open Datasets Yes We test our Deep Lab model on the PASCAL VOC 2012 segmentation benchmark (Everingham et al., 2014), consisting of 20 foreground object classes and one background class. The original dataset contains 1, 464, 1, 449, and 1, 456 images for training, validation, and testing, respectively. The dataset is augmented by the extra annotations provided by Hariharan et al. (2011), resulting in 10, 582 training images.
Dataset Splits Yes The original dataset contains 1, 464, 1, 449, and 1, 456 images for training, validation, and testing, respectively. and We conduct the majority of our evaluations on the PASCAL val set, training our model on the augmented PASCAL train set.
Hardware Specification Yes Using our Caffe-based implementation and a Titan GPU, the resulting VGG-derived network is very efficient: Given a 306 306 input image, it produces 39 39 dense raw feature scores at the top of the network at a rate of about 8 frames/sec during testing.
Software Dependencies No The paper mentions 'Caffe framework (Jia et al., 2014)' but does not specify a version number for Caffe or any other software dependencies.
Experiment Setup Yes We use a mini-batch of 20 images and initial learning rate of 0.001 (0.01 for the final classifier layer), multiplying the learning rate by 0.1 at every 2000 iterations. We use momentum of 0.9 and a weight decay of 0.0005. and We fix the number of mean field iterations to 10 for all reported experiments. and We use the default values of w2 = 3 and σγ = 3 and we search for the best values of w1, σα, and σβ by cross-validation on a small subset of the validation set (we use 100 images).