Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence

Authors: Zifan Wang, Zhuorui Ye, Haoran Wu, Junyu Chen, Li Yi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of SCSFNet, we conduct experiments on various benchmarks including two large-scale indoor benchmarks we contributed and the outdoor Semantic KITTI benchmark. Extensive experiments show SCSFNet outperforms baseline methods on multiple metrics by a large margin...
Researcher Affiliation Academia Tsinghua University, Shanghai Artificial Intelligence Laboratory, Shanghai Qi Zhi Institute {wzf22,yezr21,wuhr20,junyu-ch21}@mails.tsinghua.edu.cn, ericyi@mail.tsinghua.edu.cn
Pseudocode No The information is insufficient. The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The project page with code is available at scsfnet.github.io.
Open Datasets Yes To generate the simulated scenes suitable for our task, we specify the desired configuration of the environment in the logic language BDDL. ... In our datasets, we provide 2D pictures (RGB, normal and semantics), 3D point clouds, 3D meshes, 3D visibility grids and 3D ground truth voxel grids. The train/test split is 80%/20%." and "Semantic KITTI (Behley et al. 2019) is a very challenging and well-known large-scale outdoor dataset collected by autonomous cars.
Dataset Splits No The information is insufficient. The paper states 'The train/test split is 80%/20%' for their datasets but does not explicitly mention a validation split.
Hardware Specification No The information is insufficient. The paper does not provide specific hardware details used for running experiments.
Software Dependencies No The information is insufficient. The paper mentions 'Minkowski Engine' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes For IGPLAY and IGNAV, the dimensions of the 3D space are 4.8m horizontally, 2.88m vertically, and 4.8m in depth. We use 3 RGB frames (interval: 1 i Gibison timestep) as the input and a 240 144 240 volume with grid size 0.02m as the ground truth, which is similar to (Song et al. 2017). For Semantic KITTI, the dimensions of the 3D space are 51.2m ahead of the car, 25.6m to every side of the vehicle, and 6.4m in height. We input three frames of point clouds (0.2s interval) and use a 256 256 32 volume with a 0.2m grid size as the ground truth, provided by Semantic KITTI for the semantic scene completion benchmark.