Label-efficient Semantic Scene Completion with Scribble Annotations
Authors: Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Semantic KITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled. |
| Researcher Affiliation | Collaboration | Song Wang1 , Jiawei Yu1 , Wentong Li1 , Hao Shi1 , Kailun Yang3 , Junbo Chen2 and Jianke Zhu1; 1Zhejiang University 2Udeer.ai 3Hunan University |
| Pseudocode | No | The paper describes its methods in text and figures but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Both annotations of Scribble SC and our full implementation are available at https://github.com/songwzju/Scribble2Scene. |
| Open Datasets | Yes | Our models are trained on Scribble SC. Unless specified, the performance is mainly evaluated on the validation set of the fully-annotated Semantic KITTI [Behley et al., 2019], which is a highly challenging benchmark. All input images come from the KITTI Odometry Benchmark [Geiger et al., 2012] consisting of 22 sequences. ... In this work, we make full use of the sparse annotations in Scribble KITTI [Unal et al., 2022] to generate scribblebased semantic occupancy labels combined with the dense geometric structure to construct a new benchmark called Scribble SC. |
| Dataset Splits | Yes | Following the official setting, we use the sequences 00-10 except 08 for training with Scribble SC while sequence 08 is preserved as the validation set. |
| Hardware Specification | No | For Dean-Labeler, we adopt Cylinder3D [Zhu et al., 2021a] as the SCN backbone and use a single GPU to train the network with a batch size of 4. For Teacher-Labeler and student model, we use the same backbone of Vox Former-T [Li et al., 2023c], which takes the current and previous 4 images as input. All models based on Vox Former are trained on 4 GPUs with 20 epochs, a batch size of 1 (containing 5 images) per GPU. |
| Software Dependencies | No | The paper mentions software components like 'Cylinder3D' and 'Vox Former-T' as models or backbones, but does not provide specific version numbers for these or for general software dependencies like programming languages, deep learning frameworks, or libraries (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For Dean-Labeler, we adopt Cylinder3D [Zhu et al., 2021a] as the SCN backbone and use a single GPU to train the network with a batch size of 4. For Teacher-Labeler and student model, we use the same backbone of Vox Former-T [Li et al., 2023c], which takes the current and previous 4 images as input. All models based on Vox Former are trained on 4 GPUs with 20 epochs, a batch size of 1 (containing 5 images) per GPU. |