Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

Authors: Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, Luc V Gool

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Waymo and Argoverse-2 datasets show that HPTR achieves superior performance among end-to-end methods that do not apply expensive post-processing or model ensembling.
Researcher Affiliation Collaboration Zhejun Zhang Computer Vision Lab ETH Zurich Zurich, Switzerland zhejun.zhang@vision.ee.ethz.ch Alexander Liniger Computer Vision Lab ETH Zurich Zurich, Switzerland alex.liniger@vision.ee.ethz.ch Christos Sakaridis Computer Vision Lab ETH Zurich Zurich, Switzerland csakarid@vision.ee.ethz.ch Fisher Yu Computer Vision Lab ETH Zurich Zurich, Switzerland i@yf.io Luc Van Gool CVL, ETH Zurich, CH PSI, KU Leuven, BE INSAIT, Un. Sofia, BU vangool@vision.ee.ethz.ch. This work is funded by Toyota Motor Europe via the research project TRACE-Zürich.
Pseudocode No Figure 7 in the appendix illustrates the implementation of KNARPE with matrix operations, but it is a diagram, not pseudocode in a structured, step-by-step textual format.
Open Source Code Yes The code is available at https://github.com/zhejz/HPTR.
Open Datasets Yes We benchmark our method on the two most popular datasets: the Waymo Open Motion Dataset (WOMD) [16] and the Argoverse-2 motion forecasting dataset (AV2) [60].
Dataset Splits Yes For WOMD, we randomly sample 25% from all training episodes at each epoch; for AV2 we use 50%.
Hardware Specification Yes We train with a total batch size of 12 episodes on 4 RTX 2080Ti GPUs.
Software Dependencies No We use standard Ubuntu, Python and Pytorch without optimizing for real-time deployment. (No version numbers provided for these software dependencies).
Experiment Setup Yes We use Adam W optimizer with an initial learning rate of 1e-4 and decaying by 0.5 every 25 epochs. We train with a total batch size of 12 episodes... Our final models are trained for 120 epochs for WOMD and 150 epochs for AV2. The base number of neighbors considered by KNARPE is K = 36. This number is multiplied by γTL = 2, γAG = 4 and γAC = 10 respectively for the enhance-TL, enhance-AG and AC-to-all Transformer. We set NAC = 6 to predict exactly 6 futures as specified by the leaderboard.