SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Authors: Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, Shengbo Eben Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that SEPT, without elaborate architectural design or manual feature engineering, achieves state-of-the-art performance on the Argoverse 1 and Argoverse 2 motion forecasting benchmarks, outperforming previous methods on all main metrics by a large margin.
Researcher Affiliation Academia School of Vehicle and Mobility, Tsinghua University {lanzq21, jyx21}@mails.tsinghua.edu.cn lishbo@tsinghua.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes The effectiveness of our approach is verified on Argoverse 1 and Argoverse 2, two widely-used large-scale motion forecasting datasets collected from real world.
Dataset Splits Yes In the downstream motion prediction training stage, we train and validate following the split of the Argoverse dataset. ... In the scene understanding training stage, we concatenate train, validation and test dataset as the pretrain dataset with labels dropped.
Hardware Specification Yes Both stages are trained with a batch size of 96 on a single NVIDIA Ge Force RTX 3090 Ti GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions) used for the experiments.
Experiment Setup Yes The model is trained for 150 epochs with a constant learning rate of 2 10 4. ... The model is trained for 50 epochs with the learning rate decayed linearly from 2 10 4 to 0. Both stages are trained with a batch size of 96... In our main experiment, we simply use p MTM = 0.5, p MRM = 0.5 and Th = 8/20 (for Argoverse 1/2) without tuning. Table 6 reports the hyperparameters for the SEPT network architecture.