Spider: A Unified Framework for Context-dependent Concept Segmentation
Authors: Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). |
| Researcher Affiliation | Collaboration | 1Dalian University of Technology, China 2X3000 Inspection Co., Ltd, China 3Yale University, America. |
| Pseudocode | Yes | Algorithm 1 Training and Inference |
| Open Source Code | No | The source code will be publicly available at Spider-Uni CDSeg. |
| Open Datasets | Yes | The dataset information is shown in Table 1. We follow the training settings of recent state-of-the-art methods in these tasks and merge all training samples together as our training set. Table 1 lists datasets such as DUTS (Wang et al., 2017), COD10K (Fan et al., 2020a), and others, indicating widely used public datasets with citations. |
| Dataset Splits | No | Table 1 lists '#Train' and '#Test' datasets but does not provide specific information about a distinct validation split for the datasets used in the experiments. |
| Hardware Specification | Yes | All the experiments are implemented on the 8 Tesla A100 GPU for training 50 epochs. |
| Software Dependencies | No | The paper mentions using specific optimizers like Adam and backbones like ViT, Swin, and ConvNeXt, but does not provide specific version numbers for these software components or other libraries/frameworks (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The input resolutions of images are resized to 384 384. For each task, the mini-batch sizes of the input and prompt are set to 4 and 12, respectively. We adopt some basic image augmentation techniques to avoid overfitting, including random flipping, rotating and border clipping. The Adam (Kingma & Ba, 2015) optimizer scheduled by step with initial learning rate of 0.0001, decay size of 30 and decay rate of 0.9 is introduced to update model parameters. |