LION: Linear Group RNN for 3D Object Detection in Point Clouds

Authors: Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the effectiveness of the proposed components and the generalization of our LION on different linear group RNN operators including Mamba, RWKV, and Ret Net.
Researcher Affiliation Collaboration 1Huazhong University of Science and Technology 2The University of Hong Kong 3Baidu Inc., China
Pseudocode No The paper presents architectural diagrams (e.g., Figure 2, Figure 3) but does not include formal pseudocode or algorithm blocks.
Open Source Code No We plan to provide all code for reproducing the results after the manuscript is accepted.
Open Datasets Yes Waymo Open dataset (WOD) [52] is a well-known benchmark for large-scale outdoor 3D perception, comprising 1150 scenes which are divided into 798 scenes for training, 202 scenes for validation, and 150 scenes for testing.
Dataset Splits Yes Waymo Open dataset (WOD) [52] is a well-known benchmark for large-scale outdoor 3D perception, comprising 1150 scenes which are divided into 798 scenes for training, 202 scenes for validation, and 150 scenes for testing.
Hardware Specification Yes We train our model 24 epochs with a batch size of 16 on 8 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using specific datasets and following previous methods' settings (e.g., using DSVT's grid size, detection head, loss function, learning rate, and optimizer) but does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, CUDA 11.x).
Experiment Setup Yes On the WOD, we keep the same channel dimension C = 64 for all LION blocks in LION-Mamba, LION-RWKV, and LION-Ret Net... We follow DSVT-Voxel [60] to set the grid size as (0.32m, 0.32m, 0.1875m). The number of LION blocks N is set to 4. For these four LION blocks, the window sizes (Tx, Ty, Tz) are set to (13, 13, 32), (13, 13, 16), (13, 13, 8), and (13, 13, 4), and the corresponding group sizes K are 4096, 2048, 1024, 512, respectively. Besides, we adopt the same center-based detection head and loss function as DSVT [60] for fair comparison. In the voxel generation, we set the ratio r = 0.2 to balance the performance and computation cost.