reproducibilityindex.ai

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Authors: Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jon Shlens

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efﬁcient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7% on Cityscapes (street scene parsing), 71.3% on PASCAL-Person-Part (person-part segmentation), and 87.9% on PASCAL VOC 2012 (semantic image segmentation).
Researcher Affiliation	Industry	Liang-Chieh Chen Maxwell D. Collins Yukun Zhu George Papandreou Barret Zoph Florian Schroff Hartwig Adam Jonathon Shlens Google Inc.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	An implementation of the proposed model will be made available at https://github.com/tensorflow/ models/tree/master/research/deeplab.
Open Datasets	Yes	We demonstrate the effectiveness of our proposed method on three dense prediction tasks that are well studied in the literature: scene parsing (Cityscapes [18]), person part segmentation (PASCAL-Person Part [16]), and semantic image segmentation (PASCAL VOC 2012 [24]).
Dataset Splits	Yes	We train the best learned DPC with Mobile Net-v2 [74] and modiﬁed Xception [17, 67, 14] as network backbones on Cityscapes training set [18] and evaluate on the validation set. and select the top 50 architectures (w.r.t. validation set performance) for re-ranking based on ﬁne-tuning the entire model using Mobile Net-v2 network backbone.
Hardware Specification	Yes	For example, if one ﬁne-tunes the entire model with a single dense prediction cell (DPC) on the Cityscapes dataset, then training a candidate architecture with 90K iterations requires 1+ week with a single P100 GPU.
Software Dependencies	No	The paper mentions 'tensorflow' indirectly via a GitHub link, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The training protocol employs a polynomial learning rate [56] with an initial learning rate of 0.01, large crop sizes (e.g., 769 769 on Cityscapes and 513 513 on PASCAL images), ﬁne-tuned batch normalization parameters [40] and small batch training (batch size = 8, 16 for proxy and real tasks, respectively).