Coupled Segmentation and Edge Learning via Dynamic Graph Propagation

Authors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that the proposed framework is able to let both tasks mutually improve each other. On Cityscapes validation, our best model achieves 83.7% m Io U in semantic segmentation and 78.7% maximum F-score in semantic edge detection.
Researcher Affiliation Collaboration Zhiding Yu Rui Huang , Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz... Correspondence to Zhiding Yu <zhidingy@nvidia.com>. Work partially done during an internship at NVIDIA.
Pseudocode No The paper describes algorithms and formulations using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] Going through legal approval process at this moment. Will release source code upon approval.
Open Datasets Yes Cityscapes [62] contains 2975 training images, 500 validation images and 1525 private testing images with 19 pre-defined semantic classes. The dataset has been widely adopted as the standard benchmark for both semantic segmentation and semantic edge detection. Following a number of previous works [12, 37, 38, 21], we comprehensively conduct ablation and quantitative studies for both segmentation and edge detection on the validation set.
Dataset Splits Yes Cityscapes [62] contains 2975 training images, 500 validation images and 1525 private testing images with 19 pre-defined semantic classes. The dataset has been widely adopted as the standard benchmark for both semantic segmentation and semantic edge detection. Following a number of previous works [12, 37, 38, 21], we comprehensively conduct ablation and quantitative studies for both segmentation and edge detection on the validation set. PASCAL VOC 2012 [64] is a semantic segmentation dataset with 1464 (training) and 1449 (val) and 1456 (test) images. We use the augmented dataset with 10582 training images, as in [28]. COCO Panoptic [10] contains the mask annotations for both things and stuff, with a split setting (118K train 2017 images and 5K validation 2017 images) following the detection community.
Hardware Specification No We thank the NVIDIA GPU Cloud (NGC) team for the computing support of this work. (This mentions GPU Cloud but does not specify exact GPU models or other hardware details in the main text).
Software Dependencies No The paper describes the model architecture and training process, mentioning the use of CNNs and optimizers like SGD and ADAM, but it does not specify software dependencies with version numbers (e.g., specific versions of PyTorch, TensorFlow, or CUDA) in the main text.
Experiment Setup Yes During training, we unify the training crop size as 1024 1024 for Cityscapes, 472 472 for SBD and VOC12, and 464 464 for COCO Panoptic. All models are trained with 150k iterations on Cityscapes with batch size 8, 30k iterations on SBD and VOC12 with batch size 16, and 220k iterations on COCO Panoptic with batch size 16. We also perform data-augmentation with random mirror, scaling (scale factors in [0.5, 2.0]) and color jittering. we apply an SGD optimizer with a weight decay of 5 10 4 during training. For baselines and methods that involving CSEL, we additionally apply a second ADAM optimizer to the propagation layers. The base learning rates for methods with Res Net-101/Res Net-38 backbones are unified as 3.0 10 8/7.0 10 8 across Cityscapes, SBD and VOC12. On COCO Panoptic, the base learning rate is unified as 5.0 10 8 for all comparing methods. Unless indicated, the segmentation weight λ in Eq (7) is empirically set it to 0.5 to balance LSeg and LEdge.