SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

Authors: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liangyan Gui, Tong Sun, Yu-Xiong Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we thoroughly evaluate SOHES on various datasets and examine the Vi T-based backbone improvement for downstream tasks. We perform a series of ablation study experiments to demonstrate the efficacy of modules and steps in SOHES.
Researcher Affiliation Collaboration Shengcao Cao1 Jiuxiang Gu2 Jason Kuen2 Hao Tan2 Ruiyi Zhang2 Handong Zhao2 Ani Nenkova2 Liang-Yan Gui1 Tong Sun2 Yu-Xiong Wang1 1University of Illinois Urbana-Champaign 2Adobe Research
Pseudocode No The paper describes its method in a step-by-step manner (e.g., Step 1, Step 2, Step 3, Step 4) and includes figures illustrating the process, but it does not present formal pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper provides a "Project page: https://SOHES.github.io." However, this is a project page URL, not a direct link to a source-code repository (e.g., github.com/user/repo) nor an explicit statement confirming code release in supplementary materials or similar.
Open Datasets Yes We train our SOHES model on the SA-1B (Kirillov et al., 2023) dataset. [...] For evaluation purposes, we test SOHES on various image datasets with segmentation mask annotations in a zero-shot manner (...) MS-COCO (Lin et al., 2014), LVIS (Gupta et al., 2019), ADE20K (Zhou et al., 2017), Entity Seg (Qi et al., 2023), and SA-1B (Kirillov et al., 2023).
Dataset Splits No The paper specifies training and evaluation splits (2% for training, 0.1% for evaluation on SA-1B) but does not explicitly mention a distinct 'validation' dataset split for hyperparameter tuning or model selection.
Hardware Specification Yes The model is trained on 8 compute nodes, each equipped with 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions specific models (e.g., DINO, Vi T-Adapter, Mask2Former, Cascade PSP) and an optimizer (Adan), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The total batch size is 128, and the number of training steps is 40,000. We optimize the model with the Adan optimizer (Xie et al., 2022) and a base learning rate of 0.0008. [...] The teacher is updated as the exponential moving average of the student, with momentum m = 0.9995. [...] In the dynamic threshold, we set θscore, large = 0.7, θscore, small = 0.3, γ = 200.