FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

Authors: Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of synthetic images on two widely adopted semantic segmentation benchmarks, i.e., ADE20K [76] and COCO-Stuff [9]. They are highly challenging due to the complex taxonomy. COCO-Stuff is composed of 118,287/5,000 training/validation images, spanning over 171 semantic classes. In comparison, ADE20K is more limited in training images, containing 20,210/2,000 training/validation images and covering 150 classes. We investigate different paradigms to leverage synthetic images, including (1) jointly training on real and synthetic images, (2) pre-training on synthetic ones and then fine-tuning with real ones. We observe remarkable gains (e.g., 48.7 52.0) under both paradigms.
Researcher Affiliation Collaboration Lihe Yang1 Xiaogang Xu2,3 Bingyi Kang4 Yinghuan Shi5 Hengshuang Zhao1 1The University of Hong Kong 2Zhejiang Lab 3Zhejiang University 4Byte Dance 5Nanjing University
Pseudocode No The paper describes algorithmic steps in text and provides mathematical formulas, but it does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code Yes https://github.com/Lihe Young/Free Mask
Open Datasets Yes We evaluate the effectiveness of synthetic images on two widely adopted semantic segmentation benchmarks, i.e., ADE20K [76] and COCO-Stuff [9].
Dataset Splits Yes COCO-Stuff is composed of 118,287/5,000 training/validation images, spanning over 171 semantic classes. In comparison, ADE20K is more limited in training images, containing 20,210/2,000 training/validation images and covering 150 classes.
Hardware Specification Yes We use 8 Nvidia Tesla V100 GPUs for our training experiments. For example, it takes around 5.8 seconds to synthesize a single image with a V100 GPU. In practice, we speed up the synthesis process with 24 V100 GPUs.
Software Dependencies No The paper mentions using "MMSegmentation codebase" but does not specify its version or the versions of other core software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes In pre-training, we exactly follow the hyper-parameters of regular training. In joint training, we over-sample real images to the same number of synthetic images. The learning rate and batch size are the same as the regular training paradigm. Due to the actually halved batch size of real images in each iteration, we double the training iterations to iterate over real training images for the same epochs as regular training. Other hyper-parameters are detailed in the appendix.