Neural Diffusion Distance for Image Segmentation

Authors: Jian Sun, Zongben Xu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply neural diffusion distance to two segmentation tasks, i.e., hierarchical image segmentation and weakly supervised semantic segmentation. For the first task, we design a hierarchical clustering algorithm based on NDD, achieving significantly higher segmentation accuracy. For the second task, with the NDD as guidance, we propose an attention module using regional feature pooling for weakly supervised semantic segmentation. It achieves state-of-the-art semantic segmentation results on PASCAL VOC 2012 segmentation dataset [23] in weakly supervised setting. Our algorithm achieves significantly better accuracies on test set of BSD500. For example, Deep NCut is a state-of-the-art deep spectral segmentation method based on differentiable eigen-decomposition, and our method achieves nearly 0.1 higher in accuracy. We achieve 65.8% and 66.3% on val and test sets, which are higher than state-of-the-art AISI method also using Res Net-101 and same training set. Ablation study: As shown in Tab. 4, without regional feature pooling, i.e., ours (w/o RFP), the accuracy on val set decreases from 65.8 to 44.6.
Researcher Affiliation Academia Jian Sun and Zongben Xu School of Mathematics and Statistics Xi an Jiaotong University, P. R. China {jiansun,zbxu}@xjtu.edu.cn
Pseudocode No The paper describes the algorithms and processes in text and equations, but does not provide a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement or link for open-source code availability for the described methodology.
Open Datasets Yes We train spec-diff-net on BSD500 dataset [28] by auto-differentiation, and each image has multiple human labeled boundaries. We use Res Net-101 (excluding classification layer) pre-trained on MS-COCO [33] as in [20] for feature extraction and train spec-diff-net in 160000 steps. We train weakly supervised segmentation network (spec-diff-net is fixed and pre-trained on 500 images of BSD500) on VOC 2012 segmentation training set with augmented data [13] using only image labels.
Dataset Splits Yes Table 1 presents training (300 images in train + val of BSD500 dataset) and test (200 images in test of BSD500 dataset) accuracies measured by cosine similarity of estimated neural diffusion similarity matrix KD with target similarity matrix Kgt using different hyper-parameter T and initialized t in approximate spectral decomposition.
Hardware Specification Yes It takes 0.2 seconds to output neural diffusion distance for an image in size of 321 481 on a Ge Force GTX TITAN X GPU.
Software Dependencies No The paper mentions using Res Net-101 and specific datasets, but does not provide specific version numbers for software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use Res Net-101 (excluding classification layer) pre-trained on MS-COCO [33] as in [20] for feature extraction and train spec-diff-net in 160000 steps. We empirically found that eigenvalues of transition matrix P decrease fast from maximal value of one, we therefore set Ne = 50 in approximation of spectral decomposition for covering dominant spectrum. U0 in simultaneous iteration is initialized by Ne columns of one-hot vectors with ones uniformly located on feature grid. The neighborhood width when computing W in Eq. (1) is set to 17 on feature grid. In the followings, we set T = 2 and initialize t = 10.