SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process

Authors: Mengyu Wang, Henghui Ding, Jun Hao Liew, Jiajun Liu, Yao Zhao, Yunchao Wei

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To assess the effectiveness of Seg Refiner, we conduct comprehensive experiments on various segmentation tasks, including semantic segmentation, instance segmentation, and dichotomous image segmentation. The results demonstrate the superiority of our Seg Refiner from multiple aspects.
Researcher Affiliation Collaboration Mengyu Wang1,3 , Henghui Ding4, Jun Hao Liew5, Jiajun Liu5 Yao Zhao1,2,3 , Yunchao Wei1,2,3 1 Institute of Information Science, Beijing Jiaotong University 2 Peng Cheng Laboratory 3 Beijing Key Laboratory of Advanced Information Science and Network 4 Nanyang Technological University 5 Byte Dance
Pseudocode Yes Algorithm 1 Training Input total diffusion steps T, datasets D = {(I, Mfine, Mcoarse)K} repeat Sample (I, Mfine, Mcoarse) D Sample t Uniform(1, . . . , T) Initialize m0 = Mfine, xi,j 0 = [1, 0] q(xi,j t |xi,j 0 ) = xi,j 0 Qt Sample xi,j t q(xi,j t |xi,j 0 ), get xt {0, 1}2 H W Pixels Transition mt = xt[0] Mfine + xt[1] Mcoarse Take gradient descent step on θL(fθ(I, mt, t), Mfine) until convergence; Algorithm 2 Inference Input total diffusion steps T, image and coarse mask (I, Mcoarse) Initialize x T = [0, 1], m T = Mcoarse for t in {T, T 1, . . . , 1} do m0|t, pθ( m0|t) = fθ(I, mt, t) pθ(xi,j t 1|xi,j t ) = xi,j t P i,j θ,t Sample xi,j t pθ(xi,j t 1|xi,j t ), get xt {0, 1}2 H W Pixels Transition mt 1 = xt 1[0] m0|t + xt 1[1] Mcoarse return m0
Open Source Code Yes The source code and trained models are available at github.com/Mengyu Wang826/Seg Refiner.
Open Datasets Yes LR-Seg Refiner is trained on the LVIS dataset [20], whereas HR-Seg Refiner is trained on a composite dataset merged from two high-resolution datasets, DIS5K [40] and Thin Object-5K [32]. These datasets were chosen due to their highly accurate pixel-level annotations, thereby facilitating the training of our model to capture fine details more effectively. ... we select the widely-used COCO dataset [34] with LVIS [20] annotations. ... BIG dataset [12]
Dataset Splits Yes The evaluation metrics are the Mask AP and Boundary AP. It worth noting that these metrics are computed using the LVIS [20] annotations on the COCO validation set.
Hardware Specification Yes All the following experiments were conducted on 8 NVIDIA RTX 3090. ... All experiments are conducted on 8 NVIDIA RTX3090 GPUs with Pytorch.
Software Dependencies No The paper mentions "Pytorch" as a software dependency but does not specify its version number, nor does it list versions for any other key software components.
Experiment Setup Yes Model Architecture: Following [39], we employ U-Net for our denoising network. We modify the U-Net to take in 4-channel input (concatenation of image and the corresponding mask mt) and output a 1-channel refined mask. Both input and output resolution is set to 256 256. All others remain unchanged other than the aforementioned modifications. ... Objective Function: L = Lbce + αLtexture, where texture loss is characterized as an L1 loss between the segmentation gradient magnitudes of the predicted mask and the ground truth mask. α is set to 5 to balance the magnitude of both losses. ... Noise Schedule: we use much fewer timesteps (T = 6 this work) to ensure efficient inference. We designate βT = 0 such that x T = [0, 1] for all pixels and m T = Mcoarse (Eq. (7)). Following DDIM [46], we directly set a linear noise schedule from 0.8 to 0 on βt. ... Training Strategy: We employed double random crop as the primary data augmentation technique. ... The Adam W optimizer is used with the initial learning rate of 4 10 4. We use a multi-step learning rate schedule, which decays by 0.5 in steps 80k and 100k. Subsequently, the HR-Seg Refiner is obtained from 40k-iterations fine-tuning based on the 80k checkpoint of LR-Seg Refiner. Batch size is set to 8 in each GPU.