Trajeglish: Traffic Modeling as Next-Token Prediction
Authors: Jonah Philion, Xue Bin Peng, Sanja Fidler
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report results for rollouts produced by Trajeglish on the official WOMD Sim Agents Benchmark in Sec. 4.1. We then ablate our design choices in simplified full and partial control settings in Sec. 4.2. Finally, we analyze the representations learned by our model and the density estimates it provides in Sec. 4.3. |
| Researcher Affiliation | Collaboration | Jonah Philion1,2,3, Xue Bin Peng1,4, Sanja Fidler1,2,3 1NVIDIA, 2University of Toronto, 3Vector Institute, 4Simon Fraser University |
| Pseudocode | Yes | Pseudocode for this algorithm is included in Alg. 1. |
| Open Source Code | No | The paper mentions a project page for videos and samples, but does not include an explicit statement or link for the open-sourcing of the code for the methodology. |
| Open Datasets | Yes | We use the Waymo Open Motion Dataset (WOMD) to evaluate Trajeglish in full and partial control environments. |
| Dataset Splits | Yes | We verify in Fig. 5 that the tokenized action distribution is similar on WOMD train and validation despite the fact that the templates are optimized on the training set. A full description of how we sample from the model for this benchmark with comparisons on the WOMD validation set is included in Appendix A.5. |
| Hardware Specification | Yes | Using these tools, our model takes 2 days to train on 4 A100s. |
| Software Dependencies | No | The paper mentions 'Adam W optimizer' and 'flash attention', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The variant we use for the WOMD benchmark is trained on scenarios with up to 24 agents within 60.0 meters of the origin, up to 96 map objects with map points within 100.0 meters of the origin, 2 map encoder layers, 2 transformer encoder layers, 6 transformer decoder layers, a hidden dimension of 512, trained to predict 32 future timesteps for all agents. We train with a batch size of 96, with a tokenization temperature of 0.008, a tokenization nucleus of 0.95, a top learning rate of 5e-4 with 500 step warmup and linear decay over 800k optimization steps with Adam W optimizer (Loshchilov & Hutter, 2017). |