HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

Authors: Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our method can improve over the state-of-the-art trajectory forecasting benchmarks, including vehicles and pedestrians, for about 20% on average FDE and 50% on road boundary violation rate when predicting 6 seconds future. We also conducted human experiments to show that our predicted trajectories received 39.6% more votes than the runner-up approach and 32.2% more votes than our variant without hallucinative mixed intent loss.
Researcher Affiliation Collaboration Deyao Zhu1, Mohamed Zahran12, Li Erran Li3 , Mohamed Elhoseiny1 1 King Abdullah University of Science and Technology 2 Udacity 3 Alexa AI, Amazon and Columbia University
Pseudocode Yes Algorithm 1: Training Process Initialize EncθE, DecθD, Disφ; Initialize learning rate α, β; while not converge do ... Algorithm 2: Detailed Training Process Predicted horizon T, Integration model Inte Initialize EncθE, DecθD, Disφ; Initialize learning rate α, β; while not converge do ...
Open Source Code Yes Codes, pretrained models and preprocessed datasets are available at https://github.com/Vision-CAIR/HalentNet
Open Datasets Yes We compare the performance of our method with state-of-the-art models. To demonstrate our method s performance in complex scenarios, we focus on evaluating the nu Scenes dataset (Caesar et al., 2019a) which contains about 1000 driving scenes... In addition, we also evaluate our method on widely-used pedestrian datasets ETH (Pellegrini et al., 2009) and UCY (Leal-Taix e et al., 2014).
Dataset Splits Yes We split the data by 70%, 15%, and 15% as a training set, validation set, and test set, separately. Then, we combine these two sets as one big dataset and train both our method and Trajectron++ from scratch with map information. The assessment uses an observation period of 8 timesteps (3.2s) and a projected horizon of 12 timesteps (4.8s).
Hardware Specification Yes The training with a pretrained model lasts about 16 hours with a single NVIDIA V100 graphic card and about 24 hours from scratch.
Software Dependencies No The paper mentions using "Adam optimizer (Kingma & Ba, 2014)" but does not specify any software libraries or packages with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The number of latent code z is set as 25 latent codes following (Salzmann et al., 2020). Our method is trained for 23 epochs with the pretrained generator and 35 epochs from scratch for vehicles. We set λ = 0.5 to balance the classified latent intent behavior and hallucinative learning. The learning rate of the discriminator is lower compared to the generator to avoid a large gradient at the beginning of training.