Prompting Multi-Modal Image Segmentation with Semantic Grouping

Authors: Qibin He

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show the superiority of our Go PT, which achieves SOTA performance on various downstream multi-modal image segmentation tasks by training only < 1% model parameters.
Researcher Affiliation Academia University of Chinese Academy of Sciences, Beijing, China qibin.he@outlook.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the source code of the methodology described.
Open Datasets Yes For RGB-D segmentation, we provide the comparison results of NYUDv2 (Silberman et al. 2012) and SUN RGB-D (Song, Lichtenberg, and Xiao 2015). (ii) For RGB-T segmentation, we evaluate our segmenter on MFNet (Ha et al. 2017) and PST900 (Shivakumar et al. 2020). (iii) For RGB-SAR segmentation, we report experimental results on WHU-OS (Li et al. 2022).
Dataset Splits Yes NYUDv2 [...] split into 795/654 for train/test with 40 classes. SUN RGB-D [...] We employ the standard train/test split. MFNet [...] The train set consists of 50% daytime images and 50% nighttime images, while the val and test sets contain 25% daytime images and 25% nighttime images. PST900 [...] The ratio of train/test set is 2/1. WHU-OS [...] splitting into 60%/20%/20% for train/val/test.
Hardware Specification Yes Go PT is trained on 1 NVIDIA Tesla A100 GPU with a batch size of 64 and fine-tuning epochs of 60.
Software Dependencies No The paper mentions using Adam W optimizer but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Go PT is trained on 1 NVIDIA Tesla A100 GPU with a batch size of 64 and fine-tuning epochs of 60. Adam W (Loshchilov and Hutter 2019) is employed as training optimizer, where the initial learning rate is 4 10 5 and scheduled following the polynomial annealing policy. The parameters of the pretrained foundation model remain fixed, while the learnable prompt parameters are initialized with the xavier uniform scheme (Glorot and Bengio 2010).