Variance-Insensitive and Target-Preserving Mask Refinement for Interactive Image Segmentation

Authors: Chaowei Fang, Ziyin Zhou, Junye Chen, Hanjing Su, Qingyao Wu, Guanbin Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Grab Cut, Berkeley, SBD, and DAVIS datasets demonstrate our method s state-of-the-art performance in interactive image segmentation.
Researcher Affiliation Collaboration Chaowei Fang1, Ziyin Zhou1, Junye Chen2, Hanjing Su3, Qingyao Wu4, Guanbin Li2,5* 1School of Artificial Intelligence, Xidian University, Xi an, China 2School of Computer Science and Engineering, Research Institute of Sun Yat-sen University in Shenzhen, Sun Yat-sen University, Guangzhou, China 3Tencent 4School of Software Engineering, South China University of Technology, Guangzhou, China 5Guang Dong Province Key Laboratory of Information Security Technology
Pseudocode No The paper describes its methods but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code, such as a specific repository link or an explicit code release statement.
Open Datasets Yes The training images are collected from COCO (Lin et al. 2014) and LVIS (Gupta, Dollar, and Girshick 2019) datasets, containing 1.04 105 images and 1.6 million instance-level masks. Four publicly available datasets are used for evaluating IIS methods: Grab Cut(Rother, Kolmogorov, and Blake 2004) contains 50 images with single object masks. Berkeley (Martin et al. 2001) contains 96 images with 100 object masks. SBD (Hariharan et al. 2011) is comprised of 8,498 training images with 20,172 polygonal masks, and 2,857 validating images with 6,671 instance-level masks. Only the validating images are used for evaluation. DAVIS (Perazzi et al. 2016) contains 345 frames randomly sampled from 50 videos. Each frame is provided with high-quality masks.
Dataset Splits No The paper uses SBD's validation set for evaluation, but doesn't specify how its main training data (COCO/LVIS) is split for validation during training in a fixed, reproducible way. It mentions '30,000 images are randomly selected as the training dataset for each epoch' for COCO/LVIS, which indicates dynamic sampling rather than a fixed, reproducible split.
Hardware Specification Yes All computational experiments are performed on a system equipped with two NVIDIA Ge Force RTX 3090 GPUs, and the training duration for the proposed method is approximately 48 hours.
Software Dependencies No Training is executed with a batch size of 24, using Py Torch as the implementation framework. It does not specify a version number for PyTorch or other libraries.
Experiment Setup Yes We choose segformer B0 or segformer B3 (Xie et al. 2021) as the backbone of the segmentation model... Data augmentation is subsequently applied, encompassing random flipping, resizing with a scale factor constrained within the interval [0.75, 1.40], and randomized adjustments to brightness, contrast, and RGB coloration. σ is set to 11. The network parameters are optimized through the Adam algorithm, parameterized with β1 = 0.9 and β2 = 0.999. The model undergoes a training process of 230 epochs, with an initial learning rate of 5 10 4. This learning rate is subsequently attenuated by a factor of 0.1 at the 190th and 220th epochs. Training is executed with a batch size of 24... where α (= 0.8) is a constant