Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
Authors: Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jon Shlens
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7% on Cityscapes (street scene parsing), 71.3% on PASCAL-Person-Part (person-part segmentation), and 87.9% on PASCAL VOC 2012 (semantic image segmentation). |
| Researcher Affiliation | Industry | Liang-Chieh Chen Maxwell D. Collins Yukun Zhu George Papandreou Barret Zoph Florian Schroff Hartwig Adam Jonathon Shlens Google Inc. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | An implementation of the proposed model will be made available at https://github.com/tensorflow/ models/tree/master/research/deeplab. |
| Open Datasets | Yes | We demonstrate the effectiveness of our proposed method on three dense prediction tasks that are well studied in the literature: scene parsing (Cityscapes [18]), person part segmentation (PASCAL-Person Part [16]), and semantic image segmentation (PASCAL VOC 2012 [24]). |
| Dataset Splits | Yes | We train the best learned DPC with Mobile Net-v2 [74] and modified Xception [17, 67, 14] as network backbones on Cityscapes training set [18] and evaluate on the validation set. and select the top 50 architectures (w.r.t. validation set performance) for re-ranking based on fine-tuning the entire model using Mobile Net-v2 network backbone. |
| Hardware Specification | Yes | For example, if one fine-tunes the entire model with a single dense prediction cell (DPC) on the Cityscapes dataset, then training a candidate architecture with 90K iterations requires 1+ week with a single P100 GPU. |
| Software Dependencies | No | The paper mentions 'tensorflow' indirectly via a GitHub link, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The training protocol employs a polynomial learning rate [56] with an initial learning rate of 0.01, large crop sizes (e.g., 769 769 on Cityscapes and 513 513 on PASCAL images), fine-tuned batch normalization parameters [40] and small batch training (batch size = 8, 16 for proxy and real tasks, respectively). |