LMSeg: Language-guided Multi-dataset Segmentation

Authors: Qiang Zhou, Yuang Liu, Chaohui Yu, Jingliang Li, Zhibin Wang, Fan Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves significant improvements on four semantic and three panoptic segmentation datasets, and the ablation study evaluates the effectiveness of each component.
Researcher Affiliation Collaboration Qiang Zhou1, Yuang Liu2, Chaohui Yu1, Jing Liang Li3, Zhibin Wang1, Fan Wang1 1Alibaba Group 2East China Normal University 3University of the Chinese Academy of Sciences
Pseudocode No The paper includes diagrams of its framework and components (Figure 3, 4, 5) but does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes For semantic segmentation, we evaluate on four public semantic segmentation datasets: ADE20K (Zhou et al., 2017), COCO-Stuff-10K (Caesar et al., 2018), Cityscapes (Cordts et al., 2016), and Mapillary Vistas (Neuhold et al., 2017). For panoptic segmentation, we use COCO-Panoptic (Lin et al., 2014), ADE20K-Panoptic (Zhou et al., 2017) and Cityscapes-Panoptic (Cordts et al., 2016).
Dataset Splits Yes For semantic segmentation, we evaluate on four public semantic segmentation datasets: ADE20K (Zhou et al., 2017) (150 classes, containing 20k images for training and 2k images for validation), COCO-Stuff-10K (Caesar et al., 2018) (171 classes, containing 9k images for training and 1k images for testing), Cityscapes (Cordts et al., 2016) (19 classes, containing 2975 images for training, 500 images for validation and 1525 images for testing), and Mapillary Vistas (Neuhold et al., 2017) (65 classes, containing 18k images for training, 2k images for validation and 5k images for testing).
Hardware Specification Yes All models are trained with 8 A100 GPUs and a batch size of 16.
Software Dependencies No The paper states "We use Detectron2 (Wu et al., 2019) to implement our LMSeg." but does not provide specific version numbers for Detectron2 or any other software dependencies.
Experiment Setup Yes We use Adam W (Loshchilov & Hutter, 2019) and the poly (Chen et al., 2018) learning rate schedule with an initial learning rate of 1e 4 and a weight decay of 1e 4. A learning rate multiplier of 0.1 is applied to image encoders. For the ADE20K dataset, we use a crop size of 512 512. For the Cityscapes dataset, we use a crop size of 512 1024. For the COCO-Stuff-10k dataset, we use a crop size of 640 640. For the Mapillary Vistas dataset, we use a crop size of 1280 1280. All models are trained with 8 A100 GPUs and a batch size of 16. The hyper-parameters λfocal and λdice are set to 20.0 and 1.0 by default. The weight for the no object ( ) in the contrastive loss Lcl is set to 0.1.