Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Authors: Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv Lo RA s superiority in adapting SAM to real-world semantic segmentation tasks.
Researcher Affiliation Collaboration Zihan Zhong Tsinghua University zhongzh22@mails.tsinghua.edu.cn Zhiqiang Tang Amazon Web Services zqtang@amazon.com Tong He Amazon Web Services htong@amazon.com Haoyang Fang Amazon Web Services haoyfang@amazon.com Chun Yuan Tsinghua University yuanc@sz.tsinghua.edu.cn
Pseudocode No The paper describes the proposed method using equations and textual explanations, but no explicitly labeled pseudocode or algorithm blocks were found.
Open Source Code Yes Our code is public available at https://github.com/autogluon/autogluon/tree/master/examples/automm/Conv-Lo RA
Open Datasets Yes Our experiments encompass semantic segmentation datasets from various domains, spanning natural images, medical images, agriculture, and remote sensing. In the natural image domain, we explore two specific tasks: camouflaged object segmentation (Fan et al., 2020a; Skurowski et al., 2018; Le et al., 2019) and shadow detection (Vicente et al., 2016). Within medical segmentation, we investigate polyp segmentation (Jha et al., 2020; Bernal et al., 2015; Tajbakhsh et al., 2015; V azquez et al., 2017; Silva et al., 2014) and skin lesion segmentation (Codella et al., 2018). For agriculture and remote sensing, we employ the leaf disease segmentation (Rath, 2023) and road segmentation (Mnih, 2013) datasets as representative examples, respectively. We also explore multi-class transparent object segmentation using Trans10K-v1 (Xie et al., 2020) with 3 classes and Trans10K-v2 (Xie et al., 2021b) with 12 fine-grained classes. Further details about each dataset can be found in appendix C.
Dataset Splits Yes Additionally, we randomly divide a validation set comprising 20% of the images from the training set, for validation during training. For agriculture and remote sensing, we employ the leaf disease segmentation (Rath, 2023) and road segmentation (Mnih, 2013) datasets as representative examples, respectively. We also explore multi-class transparent object segmentation using Trans10K-v1 (Xie et al., 2020) with 3 classes and Trans10K-v2 (Xie et al., 2021b) with 12 fine-grained classes. Further details about each dataset can be found in appendix C.
Hardware Specification Yes In table 14, we compare the training / inference speed and per epoch training time with the ISIC 2017 dataset and a single V100 GPU.
Software Dependencies No The paper does not explicitly list specific software dependencies with their version numbers (e.g., Python version, PyTorch version, specific library versions).
Experiment Setup Yes We use the batch size of 4 and Adam optimizer with learning rate of 1 10 4 as default, with a weight decay of 1 10 4. A larger learning rate of 3 10 4 is found useful for the datasets we use in agriculture and remote sensing. The random horizontal flip is applied during training as data augmentation. All the methods are trained for 30 epochs with structure loss (i.e., the combination of weighted Io U loss and binary cross entropy loss) unless otherwise specified. Additionally, our Conv-Lo RA follows Shazeer et al. (2017) to introduce extra loss for balancing the utilization among the experts. The weight of the extra loss is set to 1.0 and 2.0 for binary-class and multi-class semantic segmentation respectively. We set the number of experts to be 8 by default, with each expert specializing in a scaling ratio within the continuous range from 1 to 8. And we apply Conv-Lo RA to the query, key and value matrices in self-attention layers, same as how Lo RA does.