Segment Everything Everywhere All at Once
Authors: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks. Notably, our single SEEM model achieves competitive performance across interactive segmentation, generic segmentation, referring segmentation, and video object segmentation on 9 datasets with minimum 1/100 supervision. |
| Researcher Affiliation | Collaboration | University of Wisconsin-Madison Microsoft Research, Redmond HKUST Microsoft Cloud & AI |
| Pseudocode | Yes | We summarize the training and evaluation pipeline of the proposed method with Pytorch-style pseudocode in Algorithm 1. |
| Open Source Code | No | The paper does not contain an explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | SEEM is trained on three tasks: panoptic segmentation, referring segmentation, and interactive segmentation. Panoptic and interactive segmentation are trained on COCO2017 [51] with panoptic segmentation annotations. Following [11], we exclude the validation set of Ref-COCOg [52], resulting in 107K segmentation images in total. For referring segmentation, we use a combination of Ref-COCO, Ref-COCOg, and Ref-COCO+ for COCO image annotations. |
| Dataset Splits | Yes | Following [11], we exclude the validation set of Ref-COCOg [52], resulting in 107K segmentation images in total. |
| Hardware Specification | No | The paper mentions using specific vision backbones and language encoders (e.g., Focal T [58], Da Vi T-d3 (B), Da Vi T-d5 (L) [59], Uni CL or Florence text encoder [60, 61]) but does not specify the underlying hardware (e.g., GPU models, CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch-style pseudocode' and specific models/encoders (e.g., Focal T, Da Vi T, Uni CL, Florence text encoder) but does not provide specific version numbers for PyTorch or any other ancillary software libraries or solvers used in the experiments. |
| Experiment Setup | Yes | L =αLc_CE_pano + βLm_BCE_pano + γLm_DICE_pano + a Lc_CE_ref + b Lm_BCE_ref +c Lm_DICE_ref + a Lc_CE_iseg + b Lm_BCE_iseg + c Lm_DICE_iseg (8) Where α = 2, β = γ = 5, a = 0.2, b = c = 2, CE, BCE, and DICE denotes cross-entropy, binary cross entropy and dice loss, respectively. |