A Simple Image Segmentation Framework via In-Context Examples
Authors: Yang Liu, Chenchen Jing, Hengtao Li, Muzhi Zhu, Hao Chen, Xinlong Wang, Chunhua Shen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various segmentation tasks show the effectiveness of the proposed method. Our code is released at: https://github.com/aim-uofa/SINE |
| Researcher Affiliation | Collaboration | Yang Liu1, Chenchen Jing1, Hengtao Li1, Muzhi Zhu1 Hao Chen1 , Xinlong Wang3, Chunhua Shen1,2 1Zhejiang University, China 2Ant Group 3Beijing Academy of Artificial Intelligence |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released at: https://github.com/aim-uofa/SINE |
| Open Datasets | Yes | Training Data We train our model with a diverse set of segmentation datasets, including semantic, instance, and panoptic segmentation. Specifically, we utilize three visual perception datasets: ADE20K [65] is a popular semantic segmentation dataset... COCO [31] is a widely-used dataset... Objects365 [51] is a large-scale high-quality object detection dataset. |
| Dataset Splits | Yes | ADE20K [65] is a popular semantic segmentation dataset... It has 25K images, including 20K for training, 2K for validation, and 3K for testing. |
| Hardware Specification | Yes | Our model is trained for 5 days by using 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software like DINOv2 and Adam optimizer, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We train SINE about 50K steps with 64 batch sizes. We use Adam [36] optimizer and employ β1 = 0.9, β2 = 0.999 for optimization. We use a linear learning rate scheduler with a base learning rate of 1e 4 and a warmup of 100 steps. The weight decay is set to 0.05. For data augmentation, we use random horizontal flipping and the large-scale jittering (LSJ) [13] augmentation with a random scale sampled from range 0.1 to 2.0 followed by a fixed size crop to 896 896. |