LMSeg: Language-guided Multi-dataset Segmentation
Authors: Qiang Zhou, Yuang Liu, Chaohui Yu, Jingliang Li, Zhibin Wang, Fan Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves significant improvements on four semantic and three panoptic segmentation datasets, and the ablation study evaluates the effectiveness of each component. |
| Researcher Affiliation | Collaboration | Qiang Zhou1, Yuang Liu2, Chaohui Yu1, Jing Liang Li3, Zhibin Wang1, Fan Wang1 1Alibaba Group 2East China Normal University 3University of the Chinese Academy of Sciences |
| Pseudocode | No | The paper includes diagrams of its framework and components (Figure 3, 4, 5) but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | For semantic segmentation, we evaluate on four public semantic segmentation datasets: ADE20K (Zhou et al., 2017), COCO-Stuff-10K (Caesar et al., 2018), Cityscapes (Cordts et al., 2016), and Mapillary Vistas (Neuhold et al., 2017). For panoptic segmentation, we use COCO-Panoptic (Lin et al., 2014), ADE20K-Panoptic (Zhou et al., 2017) and Cityscapes-Panoptic (Cordts et al., 2016). |
| Dataset Splits | Yes | For semantic segmentation, we evaluate on four public semantic segmentation datasets: ADE20K (Zhou et al., 2017) (150 classes, containing 20k images for training and 2k images for validation), COCO-Stuff-10K (Caesar et al., 2018) (171 classes, containing 9k images for training and 1k images for testing), Cityscapes (Cordts et al., 2016) (19 classes, containing 2975 images for training, 500 images for validation and 1525 images for testing), and Mapillary Vistas (Neuhold et al., 2017) (65 classes, containing 18k images for training, 2k images for validation and 5k images for testing). |
| Hardware Specification | Yes | All models are trained with 8 A100 GPUs and a batch size of 16. |
| Software Dependencies | No | The paper states "We use Detectron2 (Wu et al., 2019) to implement our LMSeg." but does not provide specific version numbers for Detectron2 or any other software dependencies. |
| Experiment Setup | Yes | We use Adam W (Loshchilov & Hutter, 2019) and the poly (Chen et al., 2018) learning rate schedule with an initial learning rate of 1e 4 and a weight decay of 1e 4. A learning rate multiplier of 0.1 is applied to image encoders. For the ADE20K dataset, we use a crop size of 512 512. For the Cityscapes dataset, we use a crop size of 512 1024. For the COCO-Stuff-10k dataset, we use a crop size of 640 640. For the Mapillary Vistas dataset, we use a crop size of 1280 1280. All models are trained with 8 A100 GPUs and a batch size of 16. The hyper-parameters λfocal and λdice are set to 20.0 and 1.0 by default. The weight for the no object ( ) in the contrastive loss Lcl is set to 0.1. |