Diving Segmentation Model into Pixels
Authors: Chen Gan, Zihao Yin, Kelei He, Yang Gao, Junfeng Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations of multiple learning paradigms, including unsupervised domain adaptation and semi-/fully-supervised segmentation, show that Pi XL outperforms state-of-the-art performances, especially when annotated images are scarce. Visualization of the embedding space further demonstrates that pixel learning attains a superior representation of pixel features. |
| Researcher Affiliation | Academia | Chen Gan2,3, Zihao Yin2,3, Kelei He1,3 , Yang Gao2,3, Junfeng Zhang1,3 1 Medical School of Nanjing University 2 State Key Laboratory for Novel Software Technology, Nanjing University 3 National Institute of Healthcare Data Science, Nanjing University {chengan,zihao.yin}@smail.nju.edu.cn, {hkl,gaoy,jfzhang}@nju.edu.cn |
| Pseudocode | No | The paper describes its framework and components through textual explanations and figures but does not provide any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The code will be available upon acceptance. |
| Open Datasets | Yes | GTA5. A large-scale synthetic dataset contains 24,966 annotated images. SYNTHIA. A collection of generated urban images including 9,400 images. Cityscapes. A real-world street scene dataset contains 2,975 training images and 500 validation images. |
| Dataset Splits | Yes | Fully supervised learning. We use all the training images from Cityscapes to train our Pi XL and evaluate that on the corresponding validation part. Semi-supervised learning. We randomly select 1/4, 1/8, and 1/30 labeled images from the Cityscapes training set while utilizing the left images as unlabeled images to train Pi XL. Performance is reported on the validation part. |
| Hardware Specification | Yes | The model is trained on a single Tesla V100 with 32 GB memory. |
| Software Dependencies | No | The paper mentions specific model backbones (Mi T-B5, Res Net101, Deep Lab V3+ head) and optimizers (Adam W) but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation. |
| Experiment Setup | Yes | We follow the Vayyat et al. (2022) training strategies and parameters, i.e. a batch size of 2, the optimizer is Adam W with a learning rate of 6 10 5 for the encoder and 6 10 4 for the decoder with linear warmup policy, DACS data augmentation, rare class sampling strategy on labeled training data, the self-training method for unlabeled training data. Moreover, we set the training epoch to 60000, set the η to 0.2 initially, and decrease it linearly until η = 0.001 during training. |