Label-efficient Semantic Scene Completion with Scribble Annotations

Authors: Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Semantic KITTI demonstrate that Scribble2Scene achieves competitive performance against the fully-supervised counterparts, showing 99% performance of the fully-supervised models with only 13.5% voxels labeled.
Researcher Affiliation Collaboration Song Wang1 , Jiawei Yu1 , Wentong Li1 , Hao Shi1 , Kailun Yang3 , Junbo Chen2 and Jianke Zhu1; 1Zhejiang University 2Udeer.ai 3Hunan University
Pseudocode No The paper describes its methods in text and figures but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Both annotations of Scribble SC and our full implementation are available at https://github.com/songwzju/Scribble2Scene.
Open Datasets Yes Our models are trained on Scribble SC. Unless specified, the performance is mainly evaluated on the validation set of the fully-annotated Semantic KITTI [Behley et al., 2019], which is a highly challenging benchmark. All input images come from the KITTI Odometry Benchmark [Geiger et al., 2012] consisting of 22 sequences. ... In this work, we make full use of the sparse annotations in Scribble KITTI [Unal et al., 2022] to generate scribblebased semantic occupancy labels combined with the dense geometric structure to construct a new benchmark called Scribble SC.
Dataset Splits Yes Following the official setting, we use the sequences 00-10 except 08 for training with Scribble SC while sequence 08 is preserved as the validation set.
Hardware Specification No For Dean-Labeler, we adopt Cylinder3D [Zhu et al., 2021a] as the SCN backbone and use a single GPU to train the network with a batch size of 4. For Teacher-Labeler and student model, we use the same backbone of Vox Former-T [Li et al., 2023c], which takes the current and previous 4 images as input. All models based on Vox Former are trained on 4 GPUs with 20 epochs, a batch size of 1 (containing 5 images) per GPU.
Software Dependencies No The paper mentions software components like 'Cylinder3D' and 'Vox Former-T' as models or backbones, but does not provide specific version numbers for these or for general software dependencies like programming languages, deep learning frameworks, or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For Dean-Labeler, we adopt Cylinder3D [Zhu et al., 2021a] as the SCN backbone and use a single GPU to train the network with a batch size of 4. For Teacher-Labeler and student model, we use the same backbone of Vox Former-T [Li et al., 2023c], which takes the current and previous 4 images as input. All models based on Vox Former are trained on 4 GPUs with 20 epochs, a batch size of 1 (containing 5 images) per GPU.