LION: Linear Group RNN for 3D Object Detection in Point Clouds
Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify the effectiveness of the proposed components and the generalization of our LION on different linear group RNN operators including Mamba, RWKV, and Ret Net. |
| Researcher Affiliation | Collaboration | 1Huazhong University of Science and Technology 2The University of Hong Kong 3Baidu Inc., China |
| Pseudocode | No | The paper presents architectural diagrams (e.g., Figure 2, Figure 3) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | We plan to provide all code for reproducing the results after the manuscript is accepted. |
| Open Datasets | Yes | Waymo Open dataset (WOD) [52] is a well-known benchmark for large-scale outdoor 3D perception, comprising 1150 scenes which are divided into 798 scenes for training, 202 scenes for validation, and 150 scenes for testing. |
| Dataset Splits | Yes | Waymo Open dataset (WOD) [52] is a well-known benchmark for large-scale outdoor 3D perception, comprising 1150 scenes which are divided into 798 scenes for training, 202 scenes for validation, and 150 scenes for testing. |
| Hardware Specification | Yes | We train our model 24 epochs with a batch size of 16 on 8 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using specific datasets and following previous methods' settings (e.g., using DSVT's grid size, detection head, loss function, learning rate, and optimizer) but does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, CUDA 11.x). |
| Experiment Setup | Yes | On the WOD, we keep the same channel dimension C = 64 for all LION blocks in LION-Mamba, LION-RWKV, and LION-Ret Net... We follow DSVT-Voxel [60] to set the grid size as (0.32m, 0.32m, 0.1875m). The number of LION blocks N is set to 4. For these four LION blocks, the window sizes (Tx, Ty, Tz) are set to (13, 13, 32), (13, 13, 16), (13, 13, 8), and (13, 13, 4), and the corresponding group sizes K are 4096, 2048, 1024, 512, respectively. Besides, we adopt the same center-based detection head and loss function as DSVT [60] for fair comparison. In the voxel generation, we set the ratio r = 0.2 to balance the performance and computation cost. |