Prompting Multi-Modal Image Segmentation with Semantic Grouping
Authors: Qibin He
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show the superiority of our Go PT, which achieves SOTA performance on various downstream multi-modal image segmentation tasks by training only < 1% model parameters. |
| Researcher Affiliation | Academia | University of Chinese Academy of Sciences, Beijing, China qibin.he@outlook.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the source code of the methodology described. |
| Open Datasets | Yes | For RGB-D segmentation, we provide the comparison results of NYUDv2 (Silberman et al. 2012) and SUN RGB-D (Song, Lichtenberg, and Xiao 2015). (ii) For RGB-T segmentation, we evaluate our segmenter on MFNet (Ha et al. 2017) and PST900 (Shivakumar et al. 2020). (iii) For RGB-SAR segmentation, we report experimental results on WHU-OS (Li et al. 2022). |
| Dataset Splits | Yes | NYUDv2 [...] split into 795/654 for train/test with 40 classes. SUN RGB-D [...] We employ the standard train/test split. MFNet [...] The train set consists of 50% daytime images and 50% nighttime images, while the val and test sets contain 25% daytime images and 25% nighttime images. PST900 [...] The ratio of train/test set is 2/1. WHU-OS [...] splitting into 60%/20%/20% for train/val/test. |
| Hardware Specification | Yes | Go PT is trained on 1 NVIDIA Tesla A100 GPU with a batch size of 64 and fine-tuning epochs of 60. |
| Software Dependencies | No | The paper mentions using Adam W optimizer but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | Go PT is trained on 1 NVIDIA Tesla A100 GPU with a batch size of 64 and fine-tuning epochs of 60. Adam W (Loshchilov and Hutter 2019) is employed as training optimizer, where the initial learning rate is 4 10 5 and scheduled following the polynomial annealing policy. The parameters of the pretrained foundation model remain fixed, while the learnable prompt parameters are initialized with the xavier uniform scheme (Glorot and Bengio 2010). |