Diving Segmentation Model into Pixels

Authors: Chen Gan, Zihao Yin, Kelei He, Yang Gao, Junfeng Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations of multiple learning paradigms, including unsupervised domain adaptation and semi-/fully-supervised segmentation, show that Pi XL outperforms state-of-the-art performances, especially when annotated images are scarce. Visualization of the embedding space further demonstrates that pixel learning attains a superior representation of pixel features.
Researcher Affiliation Academia Chen Gan2,3, Zihao Yin2,3, Kelei He1,3 , Yang Gao2,3, Junfeng Zhang1,3 1 Medical School of Nanjing University 2 State Key Laboratory for Novel Software Technology, Nanjing University 3 National Institute of Healthcare Data Science, Nanjing University {chengan,zihao.yin}@smail.nju.edu.cn, {hkl,gaoy,jfzhang}@nju.edu.cn
Pseudocode No The paper describes its framework and components through textual explanations and figures but does not provide any formal pseudocode or algorithm blocks.
Open Source Code No The code will be available upon acceptance.
Open Datasets Yes GTA5. A large-scale synthetic dataset contains 24,966 annotated images. SYNTHIA. A collection of generated urban images including 9,400 images. Cityscapes. A real-world street scene dataset contains 2,975 training images and 500 validation images.
Dataset Splits Yes Fully supervised learning. We use all the training images from Cityscapes to train our Pi XL and evaluate that on the corresponding validation part. Semi-supervised learning. We randomly select 1/4, 1/8, and 1/30 labeled images from the Cityscapes training set while utilizing the left images as unlabeled images to train Pi XL. Performance is reported on the validation part.
Hardware Specification Yes The model is trained on a single Tesla V100 with 32 GB memory.
Software Dependencies No The paper mentions specific model backbones (Mi T-B5, Res Net101, Deep Lab V3+ head) and optimizers (Adam W) but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup Yes We follow the Vayyat et al. (2022) training strategies and parameters, i.e. a batch size of 2, the optimizer is Adam W with a learning rate of 6 10 5 for the encoder and 6 10 4 for the decoder with linear warmup policy, DACS data augmentation, rare class sampling strategy on labeled training data, the self-training method for unlabeled training data. Moreover, we set the training epoch to 60000, set the η to 0.2 initially, and decrease it linearly until η = 0.001 during training.