Segmenting Transparent Objects in the Wild with Transformer
Authors: Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark more than 20 recent semantic segmentation methods, demonstrating that Trans2Seg significantly outperforms all the CNN-based methods, showing the proposed algorithm s potential ability to solve transparent object segmentation.Code is available in github.com/xieenze/Trans2Seg. [...] 5 Experiments |
| Researcher Affiliation | Collaboration | 1The University of Hong Kong 2Sensetime Retsearch 3Nanjing University 4Huawei Noah s Ark Lab |
| Pseudocode | No | The paper states 'The pseudo code of small conv head is shown in shown in Figure 4.' However, Figure 4 is a diagram illustrating the network architecture and data flow, not a pseudocode block. No actual pseudocode is presented. |
| Open Source Code | Yes | Code is available in github.com/xieenze/Trans2Seg. |
| Open Datasets | Yes | This work presents a new fine-grained transparent object segmentation dataset, termed Trans10Kv2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset. [...] Our Trans10K-v2 dataset is based on Trans10K dataset [Xie et al., 2020]. |
| Dataset Splits | Yes | Following Trans10K, we use 5000, 1000 and 4428 images in training, validation and testing respectively. |
| Hardware Specification | Yes | We use 8 V100 GPUs for all experiments. |
| Software Dependencies | No | The paper mentions 'We implement Trans2Seg with Pytorch' but does not specify the version number of PyTorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | For loss optimization, we use Adam optimizer with epsilon 1e-8 and weight decay 1e-4. Batch size is 8 per GPU. We set learning rate 1e-4 and decayed by the poly strategy [Yu et al., 2018] for 50 epochs. [...] For our Trans2Seg, we adopt Transformer architecture and need to keep the shape of learned position embedding same in training/inference, so we directly resize the image to 512 512. |