Segment Everything Everywhere All at Once

Authors: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks. Notably, our single SEEM model achieves competitive performance across interactive segmentation, generic segmentation, referring segmentation, and video object segmentation on 9 datasets with minimum 1/100 supervision.
Researcher Affiliation Collaboration University of Wisconsin-Madison Microsoft Research, Redmond HKUST Microsoft Cloud & AI
Pseudocode Yes We summarize the training and evaluation pipeline of the proposed method with Pytorch-style pseudocode in Algorithm 1.
Open Source Code No The paper does not contain an explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes SEEM is trained on three tasks: panoptic segmentation, referring segmentation, and interactive segmentation. Panoptic and interactive segmentation are trained on COCO2017 [51] with panoptic segmentation annotations. Following [11], we exclude the validation set of Ref-COCOg [52], resulting in 107K segmentation images in total. For referring segmentation, we use a combination of Ref-COCO, Ref-COCOg, and Ref-COCO+ for COCO image annotations.
Dataset Splits Yes Following [11], we exclude the validation set of Ref-COCOg [52], resulting in 107K segmentation images in total.
Hardware Specification No The paper mentions using specific vision backbones and language encoders (e.g., Focal T [58], Da Vi T-d3 (B), Da Vi T-d5 (L) [59], Uni CL or Florence text encoder [60, 61]) but does not specify the underlying hardware (e.g., GPU models, CPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions 'Pytorch-style pseudocode' and specific models/encoders (e.g., Focal T, Da Vi T, Uni CL, Florence text encoder) but does not provide specific version numbers for PyTorch or any other ancillary software libraries or solvers used in the experiments.
Experiment Setup Yes L =αLc_CE_pano + βLm_BCE_pano + γLm_DICE_pano + a Lc_CE_ref + b Lm_BCE_ref +c Lm_DICE_ref + a Lc_CE_iseg + b Lm_BCE_iseg + c Lm_DICE_iseg (8) Where α = 2, β = γ = 5, a = 0.2, b = c = 2, CE, BCE, and DICE denotes cross-entropy, binary cross entropy and dice loss, respectively.