OTOcc: Optimal Transport for Occupancy Prediction

Authors: Pengteng Li, Ying He, F. Richard Yu, Pinhao Song, Xingchen Zhou, Guang Zhou

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that OTOcc not only has the competitive prediction performance but also has about more than 4.58% reduction in computational overhead compared to state-of-the-art methods.
Researcher Affiliation Collaboration 1College of Computer Science and Software Engineering, Shenzhen University, China 2Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) 3KU Leuven 4Deeproute Inc.
Pseudocode Yes Algorithm 1 Optimal Transport for Strategy Optimization
Open Source Code No The paper does not explicitly state that the source code for OTOcc is publicly available or provide a link to it. The provided link is for a dataset.
Open Datasets Yes Occ3D-nu Scenes1 [Tian et al., 2023] contains 700 training scenes and 150 validation scenes. The occupancy scope is defined as 40m to 40m for x-axis or y-axis, and 1m to 5.4m for the z-axis in the ego coordinate. The voxel size is 0.4m 0.4m 0.4m for the occupancy label. The semantic labels contain 17 categories. nu Scenes [Caesar et al., 2020] is a large-scale autonomous driving dataset, collected in Boston and Singapore. It includes 1000 driving sequences from various scenes, split into 700 in the training set, 150 in the validation set, and 150 in the test set.1https://github.com/Tsinghua-MARS-Lab/Occ3D/
Dataset Splits Yes Occ3D-nu Scenes1 [Tian et al., 2023] contains 700 training scenes and 150 validation scenes.nu Scenes [Caesar et al., 2020] is a large-scale autonomous driving dataset, collected in Boston and Singapore. It includes 1000 driving sequences from various scenes, split into 700 in the training set, 150 in the validation set, and 150 in the test set.
Hardware Specification Yes trained the model on 8 NVIDIA A100 GPUs with a batch size of 1 per GPU.
Software Dependencies No The paper mentions optimizers (Adam W), loss functions (cross entropy, Lovasz-softmax), and using ResNet101-DCN as an image backbone, but it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We adopted Res Net101-DCN as the image backbone and trained the model on 8 NVIDIA A100 GPUs with a batch size of 1 per GPU. We utilize the Adam W optimizer with an initial learning rate of 2 10 4 and the cosine schedule. Additionally, we employ photo-metric distortion as the data augmentation technique. The initial resolution of the (X, Y, Z) is (100, 100, 8) and the upsampled voxel features have dimensions of (X , Y , Z ) is (200, 200, 16). We also set Np = 4, T = 40, λ = 0.2 and C = 256. Note that our proposed OTOcc is trained with only 15 epochs for efficiency.