Segmenting Transparent Objects in the Wild with Transformer

Authors: Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark more than 20 recent semantic segmentation methods, demonstrating that Trans2Seg significantly outperforms all the CNN-based methods, showing the proposed algorithm s potential ability to solve transparent object segmentation.Code is available in github.com/xieenze/Trans2Seg. [...] 5 Experiments
Researcher Affiliation Collaboration 1The University of Hong Kong 2Sensetime Retsearch 3Nanjing University 4Huawei Noah s Ark Lab
Pseudocode No The paper states 'The pseudo code of small conv head is shown in shown in Figure 4.' However, Figure 4 is a diagram illustrating the network architecture and data flow, not a pseudocode block. No actual pseudocode is presented.
Open Source Code Yes Code is available in github.com/xieenze/Trans2Seg.
Open Datasets Yes This work presents a new fine-grained transparent object segmentation dataset, termed Trans10Kv2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset. [...] Our Trans10K-v2 dataset is based on Trans10K dataset [Xie et al., 2020].
Dataset Splits Yes Following Trans10K, we use 5000, 1000 and 4428 images in training, validation and testing respectively.
Hardware Specification Yes We use 8 V100 GPUs for all experiments.
Software Dependencies No The paper mentions 'We implement Trans2Seg with Pytorch' but does not specify the version number of PyTorch or any other software dependencies with their versions.
Experiment Setup Yes For loss optimization, we use Adam optimizer with epsilon 1e-8 and weight decay 1e-4. Batch size is 8 per GPU. We set learning rate 1e-4 and decayed by the poly strategy [Yu et al., 2018] for 50 epochs. [...] For our Trans2Seg, we adopt Transformer architecture and need to keep the shape of learned position embedding same in training/inference, so we directly resize the image to 512 512.