Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

Authors: Muzhi Zhu, Yang Liu, Zekai Luo, Chenchen Jing, Hao Chen, Guangkai Xu, Xinlong Wang, Chunhua Shen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our method significantly outperforms the previous SOTA models in multiple settings.
Researcher Affiliation Academia 1Zhejiang University 2Beijing Academy of Artificial Intelligence
Pseudocode No The paper does not include a figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured steps formatted like code or an algorithm.
Open Source Code Yes Our code is released at: https://github.com/aim-uofa/Diffew S
Open Datasets Yes Following the few-shot setting on COCO-20i [52], we organize 80 classes from COCO2014 [53] into 4 folds. Each trial consists of 60 classes allocated for training and 20 classes designated for testing. For evaluation, we randomly sample 1000 reference-target pairs in each fold with the same seed used in HSNet [18]. 2. In-context setting: Following the setting in Seg GPT [30], COCO, ADE [54], and PASCAL VOC [55] serve as the training set.
Dataset Splits No The paper mentions training and testing sets, and that ablation experiments were 'validated on Fold0 of COCO-20i', but it does not specify explicit validation dataset splits (e.g., percentages or counts) distinct from the training or test sets for overall model training.
Hardware Specification Yes Under the strict few-shot setting, the model undergoes training on four V100 GPUs. ... The training took place on a single 4090 GPU, with a gradient accumulation set at 4, which brought the total batch size to 4.
Software Dependencies No The paper mentions 'Stable Diffusion 2.1' (a model) and 'Adam optimizer' (an algorithm), but does not provide specific version numbers for software libraries, programming languages, or other ancillary software dependencies (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We initialize our model with Stable Diffusion 2.1 [2]. The Adam optimizer is used with a weight decay set at 0.01 and a learning rate of 1e-5, coupled with a linear schedule. In terms of data augmentation, our methodology only involves resizing the input image directly to 512x512. No additional data augmentation occurs. ... With the gradient accumulation set at 4, the total batch size comes to 16. Training carries out for 10,000 iterations, typically requiring six hours. For in-context setting, ... adjust the total training iterations to 30000 iterations.