SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction
Authors: Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng KAN
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | SMART achieves state-of-the-art performance across most of the metrics on the generative Sim Agents challenge, ranking 1st on the leaderboards of Waymo Open Motion Dataset (WOMD), demonstrating remarkable inference speed. Moreover, SMART represents the generative model in the autonomous driving motion domain, exhibiting zero-shot generalization capabilities: Using only the Nu Plan dataset for training and WOMD for validation, SMART achieved a competitive score of 0.72 on the Sim Agents challenge. Lastly, we have collected over 1 billion motion tokens from multiple datasets, validating the model s scalability. |
| Researcher Affiliation | Collaboration | Wei Wu Tsinghua University Sense Time Research wuwei@senseauto.com Xiaoxin Feng Sense Time Research fengxiaoxin@senseauto.com Ziyan Gao Sense Time Research gaoziyan@senseauto.com Yuheng Kan Sense Time Research kanyuheng@senseauto.com |
| Pseudocode | No | The paper describes the model architecture and training tasks but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have released all the code to promote the exploration of models for motion generation in the autonomous driving field. The source code is available at https://github.com/rainmaker22/SMART. |
| Open Datasets | Yes | ranking 1st on the leaderboards of Waymo Open Motion Dataset (WOMD), demonstrating remarkable inference speed. Moreover, SMART represents the generative model in the autonomous driving motion domain, exhibiting zero-shot generalization capabilities: Using only the Nu Plan dataset for training and WOMD for validation, SMART achieved a competitive score of 0.72 on the Sim Agents challenge. |
| Dataset Splits | Yes | For all experiments, the testing datasets employed the split validation dataset from WOMD. Overall, we trained models across four sizes, ranging from 1M to 100M parameters, on a training set containing 2.2M scenarios (or 1B motion tokens under 0.5s agent motion tokenization). |
| Hardware Specification | Yes | The training and inference time is measured on 32 NVIDIA TESLA V100 GPUs. All models in this paper were trained using 32 V100 GPUs. The training process requires a GPU memory of at least 25GB, while model inference typically requires only 10GB of memory. |
| Software Dependencies | No | The paper mentions using the 'Adam W optimizer [23]' but does not specify versions for other key software components or libraries (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Both the dropout rate and the weight decay rate are set to 0.1. The learning rate is decayed from 0.0002 to 0 using a cosine annealing scheduler. Training includes all vehicles within a scene. The batch size is set to 4, with a maximum GPU memory usage of 30GB. |