SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
Authors: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liangyan Gui, Tong Sun, Yu-Xiong Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we thoroughly evaluate SOHES on various datasets and examine the Vi T-based backbone improvement for downstream tasks. We perform a series of ablation study experiments to demonstrate the efficacy of modules and steps in SOHES. |
| Researcher Affiliation | Collaboration | Shengcao Cao1 Jiuxiang Gu2 Jason Kuen2 Hao Tan2 Ruiyi Zhang2 Handong Zhao2 Ani Nenkova2 Liang-Yan Gui1 Tong Sun2 Yu-Xiong Wang1 1University of Illinois Urbana-Champaign 2Adobe Research |
| Pseudocode | No | The paper describes its method in a step-by-step manner (e.g., Step 1, Step 2, Step 3, Step 4) and includes figures illustrating the process, but it does not present formal pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper provides a "Project page: https://SOHES.github.io." However, this is a project page URL, not a direct link to a source-code repository (e.g., github.com/user/repo) nor an explicit statement confirming code release in supplementary materials or similar. |
| Open Datasets | Yes | We train our SOHES model on the SA-1B (Kirillov et al., 2023) dataset. [...] For evaluation purposes, we test SOHES on various image datasets with segmentation mask annotations in a zero-shot manner (...) MS-COCO (Lin et al., 2014), LVIS (Gupta et al., 2019), ADE20K (Zhou et al., 2017), Entity Seg (Qi et al., 2023), and SA-1B (Kirillov et al., 2023). |
| Dataset Splits | No | The paper specifies training and evaluation splits (2% for training, 0.1% for evaluation on SA-1B) but does not explicitly mention a distinct 'validation' dataset split for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | The model is trained on 8 compute nodes, each equipped with 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions specific models (e.g., DINO, Vi T-Adapter, Mask2Former, Cascade PSP) and an optimizer (Adan), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The total batch size is 128, and the number of training steps is 40,000. We optimize the model with the Adan optimizer (Xie et al., 2022) and a base learning rate of 0.0008. [...] The teacher is updated as the exponential moving average of the student, with momentum m = 0.9995. [...] In the dynamic threshold, we set θscore, large = 0.7, θscore, small = 0.3, γ = 200. |