Towards Image Understanding from Deep Compression Without Decoding

Authors: Robert Torfason, Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study shows that accuracies comparable to networks that operate on compressed RGB images can be achieved while reducing the computational complexity up to 2 . Furthermore, we show that synergies are obtained by jointly training compression networks with classification networks on the compressed representations, improving image quality, classification accuracy, and segmentation performance.
Researcher Affiliation Collaboration Robert Torfason ETH Zurich, Merantix Fabian Mentzer ETH Zurich Eirikur Agustsson ETH Zurich Michael Tschannen ETH Zurich Radu Timofte ETH Zurich, Merantix Luc Van Gool ETH Zurich, KU Leuven
Pseudocode No The paper describes architectures and procedures in text and tables but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions adapting existing code: 'Our implementation is adapted using the codes from Deep Lab-Res Net-Tensor Flow2. https://github.com/Dr Sleep/tensorflow-deeplab-resnet'. However, it does not provide an explicit statement or link for the open-sourcing of the authors' own methodology or code for the work described in the paper.
Open Datasets Yes We use the Image Net dataset from the Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2014) to train our image classification networks and our compression network. The PASCAL VOC-2012 dataset (Everingham et al. (2015)) for semantic segmentation was used for image segmentation tasks.
Dataset Splits Yes It consists of 1.28 million training images and 50k validation images. These images are distributed across 1000 diverse classes. The original dataset is furthermore augmented with extra annotations provided by Hariharan et al. (2011), so the final dataset has 10,582 images for training and 1449 images for validation.
Hardware Specification Yes All benchmarks were run on a Ge Force Titan X GPU in Tensor Flow v1.3.
Software Dependencies Yes All benchmarks were run on a Ge Force Titan X GPU in Tensor Flow v1.3.
Experiment Setup Yes For training we use the standard hyperparameters and a slightly modified pre-processing procedure from He et al. (2015), described in detail in in Appendix A.4. For the training we use a batch size 64 and employ the linear scaling rule from Goyal et al. (2017) and use the learning rate 0.025. We employ the same learning rate schedule as in (He et al., 2015), but for faster training iterations we decay the learning rate 3.75 faster. We use a constant learning rate that is divided by a factor of 10 at 8, 16, and 24 epochs and we train for a total of 28 epochs. A stochastic gradient descent (SGD) optimizer is used with momentum 0.9. We use weight decay of 0.0001. For pre-processing we do random-mirroring of inputs, random-cropping of inputs (224 224 for RGB images, 28 28 for compressed representations) and center the images using per channel mean over the Image Net dataset. For the training of the segmentation architecture we use the same settings as in Chen et al. (2016) with a slightly modified pre-processing procedure as described in Appendix A.5. We use batch size 10 and perform 20k iterations for training using SGD optimizer with momentum 0.9. The initial learning rate is 0.001 (0.01 for final classification layer) and the learning rate policy is as follows: at each step the initial learning rate is multiplied by (1 iter max iter)0.9. We use a weight decay of 0.0005. For preprocessing we do random-mirroring of inputs, random-cropping of inputs (320 320 for RGB images, 40 40 for the compressed representation) and center the images using per channel mean over the dataset. For joint training we set the hyperparameters in Eq. 2 to γ = 0.001, β = 150 and Ht = 1.265 for the 0.635 bpp operating point and γ = 0.001, β = 600 and Ht = 0.8 for the 0.0983 bpp operating point. The learning rate schedule is similar to the one used in the image classification setting. It starts with an initial learning rate of 0.0025 that is divided by 10 every 3 epochs using a SGD optimizer with momentum 0.9. The joint network is then trained for a total of 9 epochs.