Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Authors: Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jon Shlens

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7% on Cityscapes (street scene parsing), 71.3% on PASCAL-Person-Part (person-part segmentation), and 87.9% on PASCAL VOC 2012 (semantic image segmentation).
Researcher Affiliation Industry Liang-Chieh Chen Maxwell D. Collins Yukun Zhu George Papandreou Barret Zoph Florian Schroff Hartwig Adam Jonathon Shlens Google Inc.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes An implementation of the proposed model will be made available at https://github.com/tensorflow/ models/tree/master/research/deeplab.
Open Datasets Yes We demonstrate the effectiveness of our proposed method on three dense prediction tasks that are well studied in the literature: scene parsing (Cityscapes [18]), person part segmentation (PASCAL-Person Part [16]), and semantic image segmentation (PASCAL VOC 2012 [24]).
Dataset Splits Yes We train the best learned DPC with Mobile Net-v2 [74] and modified Xception [17, 67, 14] as network backbones on Cityscapes training set [18] and evaluate on the validation set. and select the top 50 architectures (w.r.t. validation set performance) for re-ranking based on fine-tuning the entire model using Mobile Net-v2 network backbone.
Hardware Specification Yes For example, if one fine-tunes the entire model with a single dense prediction cell (DPC) on the Cityscapes dataset, then training a candidate architecture with 90K iterations requires 1+ week with a single P100 GPU.
Software Dependencies No The paper mentions 'tensorflow' indirectly via a GitHub link, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The training protocol employs a polynomial learning rate [56] with an initial learning rate of 0.01, large crop sizes (e.g., 769 769 on Cityscapes and 513 513 on PASCAL images), fine-tuned batch normalization parameters [40] and small batch training (batch size = 8, 16 for proxy and real tasks, respectively).