M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking
Authors: Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wenping Ma, Cai Xu, Can Qin
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmarks such as KITTI, nu Scenes, and Waymo Open Dataset demonstrate that M3SOT achieves state-of-the-art performance at 38 FPS. |
| Researcher Affiliation | Academia | Jiaming Liu1, Yue Wu1*, Maoguo Gong1, Qiguang Miao1, Wenping Ma1, Cai Xu1, Can Qin2 1Xidian University, China 2Northeastern University, USA {ljm@stu., ywu@, qgmiao@, wpma@mail., cxu@}xidian.edu.cn, gong@ieee.org, qin.ca@northeastern.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our code and models are available at https://github.com/ywu0912/Team Code.git. |
| Open Datasets | Yes | We compare the proposed M3SOT with state-of-the-art methods on three large datasets: KITTI (Geiger, Lenz, and Urtasun 2012), nu Scenes (Caesar et al. 2020), and Waymo Open Dataset (WOD) (Sun et al. 2020). |
| Dataset Splits | Yes | For KITTI, we divide the training sequence into three parts, 0-16 for training, 17-18 for validation, and 19-20 for testing. For the more challenging nu Scenes, we use its validation split to evaluate our model, which contains 150 scenarios. |
| Hardware Specification | Yes | Extensive experiments show that M3SOT achieves state-of-the-art performance on three benchmarks while running at 38 FPS on a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers were explicitly provided. The paper mentions 'DGCNN' and 'X-RPN' as components and 'Mind Spore, CANN and Ascend AI Processor' in acknowledgments, but without version details. |
| Experiment Setup | Yes | Implementation Details. We dilate the ground truth BBox by 2 meters to track possible objects in the area. DGCNN (Wang et al. 2019) with different configurations is used as the feature extractor, and X-RPN (Xu et al. 2023a) with the same parameters is used as the localization head. |