Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation
Authors: Ping Hu, Stan Sclaroff, Kate Saenko
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our framework through comprehensive experiments on multiple challenging benchmarks, and show that our method achieves significant accuracy improvement over previous approaches for large-scale open-set segmentation. |
| Researcher Affiliation | Collaboration | 1Boston University 2MIT-IBM Watson AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Datasets. We adopt the two challenging benchmarks with large category sets and sufficient image samples for experiments, which are ADE20K [64] and Pascal-Context [42]. |
| Dataset Splits | Yes | The ADE20K dataset contain 20K/2K/3K images for training/validation/testing respectively and provide a dense annotation of 150 categories including both objects and stuff. The Pascal-Context dataset consisting of both diverse indoor and outdoor images, which are split into 4998 training images and 5104 validation images. |
| Hardware Specification | Yes | We use Pytorch for model implementations and conduct all the experiments on a Titan Xp GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch' for model implementations but does not specify its version or the versions of other software dependencies like Deep Lab V3+, Res Net-50, or word2vec. |
| Experiment Setup | Yes | On both datasets, we set the weight λ = 0.05 for the loss in Eq. 2 and adopt Batchsize 8 and apply SGD [48] with learning rate 5 10 4, momentum of 0.9, and weight decay 5 10 4 to optimize the model for 20K iterations. Data augmentation including random horizontal flipping, random scaling (from 0.75 to 2), random cropping, and color jittering are applied in the training process. During testing, we input images at resolution 513 513, and threshold (with 0.5) the output to achieve binary output. |