SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
Authors: Mengyu Wang, Henghui Ding, Jun Hao Liew, Jiajun Liu, Yao Zhao, Yunchao Wei
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess the effectiveness of Seg Refiner, we conduct comprehensive experiments on various segmentation tasks, including semantic segmentation, instance segmentation, and dichotomous image segmentation. The results demonstrate the superiority of our Seg Refiner from multiple aspects. |
| Researcher Affiliation | Collaboration | Mengyu Wang1,3 , Henghui Ding4, Jun Hao Liew5, Jiajun Liu5 Yao Zhao1,2,3 , Yunchao Wei1,2,3 1 Institute of Information Science, Beijing Jiaotong University 2 Peng Cheng Laboratory 3 Beijing Key Laboratory of Advanced Information Science and Network 4 Nanyang Technological University 5 Byte Dance |
| Pseudocode | Yes | Algorithm 1 Training Input total diffusion steps T, datasets D = {(I, Mfine, Mcoarse)K} repeat Sample (I, Mfine, Mcoarse) D Sample t Uniform(1, . . . , T) Initialize m0 = Mfine, xi,j 0 = [1, 0] q(xi,j t |xi,j 0 ) = xi,j 0 Qt Sample xi,j t q(xi,j t |xi,j 0 ), get xt {0, 1}2 H W Pixels Transition mt = xt[0] Mfine + xt[1] Mcoarse Take gradient descent step on θL(fθ(I, mt, t), Mfine) until convergence; Algorithm 2 Inference Input total diffusion steps T, image and coarse mask (I, Mcoarse) Initialize x T = [0, 1], m T = Mcoarse for t in {T, T 1, . . . , 1} do m0|t, pθ( m0|t) = fθ(I, mt, t) pθ(xi,j t 1|xi,j t ) = xi,j t P i,j θ,t Sample xi,j t pθ(xi,j t 1|xi,j t ), get xt {0, 1}2 H W Pixels Transition mt 1 = xt 1[0] m0|t + xt 1[1] Mcoarse return m0 |
| Open Source Code | Yes | The source code and trained models are available at github.com/Mengyu Wang826/Seg Refiner. |
| Open Datasets | Yes | LR-Seg Refiner is trained on the LVIS dataset [20], whereas HR-Seg Refiner is trained on a composite dataset merged from two high-resolution datasets, DIS5K [40] and Thin Object-5K [32]. These datasets were chosen due to their highly accurate pixel-level annotations, thereby facilitating the training of our model to capture fine details more effectively. ... we select the widely-used COCO dataset [34] with LVIS [20] annotations. ... BIG dataset [12] |
| Dataset Splits | Yes | The evaluation metrics are the Mask AP and Boundary AP. It worth noting that these metrics are computed using the LVIS [20] annotations on the COCO validation set. |
| Hardware Specification | Yes | All the following experiments were conducted on 8 NVIDIA RTX 3090. ... All experiments are conducted on 8 NVIDIA RTX3090 GPUs with Pytorch. |
| Software Dependencies | No | The paper mentions "Pytorch" as a software dependency but does not specify its version number, nor does it list versions for any other key software components. |
| Experiment Setup | Yes | Model Architecture: Following [39], we employ U-Net for our denoising network. We modify the U-Net to take in 4-channel input (concatenation of image and the corresponding mask mt) and output a 1-channel refined mask. Both input and output resolution is set to 256 256. All others remain unchanged other than the aforementioned modifications. ... Objective Function: L = Lbce + αLtexture, where texture loss is characterized as an L1 loss between the segmentation gradient magnitudes of the predicted mask and the ground truth mask. α is set to 5 to balance the magnitude of both losses. ... Noise Schedule: we use much fewer timesteps (T = 6 this work) to ensure efficient inference. We designate βT = 0 such that x T = [0, 1] for all pixels and m T = Mcoarse (Eq. (7)). Following DDIM [46], we directly set a linear noise schedule from 0.8 to 0 on βt. ... Training Strategy: We employed double random crop as the primary data augmentation technique. ... The Adam W optimizer is used with the initial learning rate of 4 10 4. We use a multi-step learning rate schedule, which decays by 0.5 in steps 80k and 100k. Subsequently, the HR-Seg Refiner is obtained from 40k-iterations fine-tuning based on the 80k checkpoint of LR-Seg Refiner. Batch size is set to 8 in each GPU. |