Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals
Authors: Shaoheng Fang, Zuhong Liu, Mingyu Wang, Chenxin Xu, Yiqi Zhong, Siheng Chen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations conducted on the nu Scenes (Caesar et al. 2020) dataset demonstrate that our proposed methodology improves upon previous self-supervised approaches by up to 40%. Notably, our method achieves performance comparable to weakly-supervised and fully-supervised methods. |
| Researcher Affiliation | Academia | Shaoheng Fang1, Zuhong Liu1, Mingyu Wang2, Chenxin Xu1, Yiqi Zhong3, Siheng Chen1,4 1 Shanghai Jiao Tong University 2 University of Chinese Academy of Sciences 3 University of Southern California 4 Shanghai AI Laboratory |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code, such as a repository link or an explicit statement about code release. |
| Open Datasets | Yes | We evaluate our approach on the Nu Scenes (Caesar et al. 2020) dataset. Nu Scenes contains 1000 scenes, each of which has 20 seconds of Li DAR point cloud sequences and multi-view camera videos annotated at 2Hz. |
| Dataset Splits | Yes | Following the setting in previous works for fair comparisons (Wu, Chen, and Metaxas 2020; Wang et al. 2022; Luo, Yang, and Yuille 2021; Li et al. 2023; Jia et al. 2023), we adopt 500 scenes for training, 100 scenes for validation, and 250 scenes for testing. |
| Hardware Specification | Yes | All models are trained on four NVIDIA 3090 GPUs with a batch size of 64. |
| Software Dependencies | No | The paper mentions "we employ (Teed and Deng 2020) as the optical flow estimation model with the pretrained parameters offered by Pytorch," but it does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The input point clouds are cropped within a range of [ 32, 32] [ 32, 32] [ 3, 2] meters, and the BEV output map is 256 256 in size... The static/dynamic classification thresholds in eq.5 are τ 2D = 5pixels and τ 3D = 1m... For the training loss in eq.10, we set λmc = 1, λpr = 0.1 and λtc = 0.4. We employ Adam W (Loshchilov and Hutter 2017) optimization algorithm for training... We train the model for 100 epochs with an initial learning rate of 0.008, and we decay the learning rate by 0.5 every 20 epochs. |